Skip to content

Rhaptos Software Development

Personal tools
You are here: Home » Developer Blog » Chuck's CnxBlog » Producing the CNXML Specification from the Schema

Producing the CNXML Specification from the Schema Producing the CNXML Specification from the Schema

Document Actions
Submitted by cbearden. on 2005-11-10 15:47. DocumentationMarkup
An overview of the process I propose for generating the next CNXML specification from the Relax NG schema.

One of the charges of the CNXML Spec working group is to devise a way to produce the finished, consumable specification from the canonical Relax NG schema insofar as is possible, perhaps in a way analogous to the production of Java documentation from commented source-code via javadoc. Such an approach has an obvious attraction: the documentation for an element is kept in close proximity to the definition of the element in the schema, which makes it easier for maintainers to keep the spec in harmony with the language itself. Furthermore, the schema definition provides enough information to permit the description of attributes and child elements (and their number and order) to be automatically generated.

Sounds like a useful idea. But what are our options in implementing it?

Some background

(1) Relax NG permits the embedding of well-formed elements from other namespaces just about anywhere in the schema. Parsers that validate Relax NG will simply ignore any elements not defined by Relax NG (see this section of Eric van der Vlist's book on RelaxNG). Let's take as an example a simplified version of the RNG definition of CNXML's para element, with the foreign CNXML elements in green, and the RNG elements in light blue:

<element name="para">
  <cnx:para id="para-def">The <cnx:code>para</cnx:code>
  element denotes paragraphs in the structure of a CNXML
  document.</cnx:para>
  <attribute name="id"/>
  <oneOrMore>
    <text/>
    <ref name="code"/>
  </oneOrMore>
</element>
While RNG ignores the embedded CNXML, our document generation system can be made to pay attention to it.

(2) For grouping related pattern definitions together, Relax NG supplies a <div> element in its own namespace (see here). For instance, we might choose to document groups of related elements like so:

<div>
<cnx:name>Links to other documents</cnx:name>
<cnx:para id="links-p1">CNXML provides two elements 
for linking to other documents and to other parts 
of the same document: the 'cnxn' tag creates links 
to Connexions modules in the same repository, and 
the 'link' tag creates links to documents that aren't 
CNXML modules.</cnx:para>

  <define name="cnxn">
    <cnx:para id="cnxn-p1">Use the 'cnxn' element to 
    make links to other Connexions modules in the same 
    repository, and to other parts of the same 
    module.</cnx:para>
    <element name="cnxn">
      ...
    </element>
  </define>

  <define name="link">
    <cnx:para id="link-p1">Use the 'link' element to 
    create links to other documents that are not 
    Connexions modules in the same 
    repository.</cnx:para>
    <element name="link">
      ...
    </element>
  </define>
</div>

(3) The litprog tools from the DocBook project can be used to apply the literate programming methodology to Relax NG schemas. According to Eric van der Vlist:

The basic idea of literate programming (or litprog) is to include a snippet of code (or a snippet of schemas in our case) within the documentation, which can be written in any XML format, including XHTML or DocBook. From this single document embedding code in documentation, a couple of XSLT transformations generate a formatted documentation and the source code.
So literate programming is documentation-centric, rather than source-code- or schema-centric. For more on literate programming, see 14.1. Literate Programming in van der Vlist's book on Relax NG.

Options

  1. The schema lives (fragmented) within the specification, à la literate programming (documentation-centric).
  2. All the specification (including introduction, examples, lives within the schema (schema-centric).
  3. Some, but not all of the specification lives in the schema, probably with the parts not dealing with elements and element groups in a separate framework document, which is combined at doc build-time with the parts of the spec inside the schema (schema-centric).
    1. Element- and element-group-specific parts of the spec, as well as examples, live in the schema; everything else lives in the spec framework document.
    2. Element- and element-group-specific parts of the spec live in the schema; examples live in a separate file; everything else lives in the spec framework document.
    3. Element-specific parts of the spec live in the schema; examples live in a separate file; everything else (including element-group-specific parts of the spec) lives in the spec framework document.

If we select an approach that results in a cluttered-seeming schema (say, 2 or 3.b), we can generate not only the specification from the schema and framework docs, but also an uncluttered version of the schema to stow in /usr/share/xml (or wherever). So we would be generating not only the specification but also the schema used in validation from our source docs.

My Preference

My own preference is for something like 3.a. or 3.b., exercising the option to extract a lightly-documented schema from the schema cum spec. I'm somewhat inclined to want the examples in a separate file, so that one doesn't have to operate on the schema to add or modify the examples. One might object that we now have part of the spec stored apart from the canonical schema, thus obviating the chief advantage of our approach. In response I would say that, if we follow our goal of making all future versions of CNXML backwards-compatible with CNXML 0.5, then we will chiefly be adding new examples in response to changes in the schema, rather than deleting or modifying existing examples.

I prefer to avoid the literate programming model, since I think I would find it unnatural to have to maintain a fragmentary schema.

So the pipeline for producing the specification would look something like this: The elements of the specification living in the schema (and optionall the examples in the examples file) would be extracted and stuffed into the framework document as CNXML; XSLT transformations could be applied to produce XHTML and PDF output (or WAP/WAML output--I mean, who wouldn't want to read the CNXML spec on his cell phone? :-).

A lightly-annotated schema could be extracted via XSLT from the schema cum spec for deployment to the system XML directory.

Discussion?

So, use the comments feature to tell me what you think about how we should approach the integration of schema and specification.

Re: Producing the CNXML Specification from the Schema

Posted by kclarks at 2005-11-10 16:58
If I were maintaining the Schema, I would prefer to work with either 2 or 3b.

I like the idea of having it all in one document, but I can understand that it could be really messy to maintain.

If you're going to split it up, I would think that it would be best to leave the examples out of the schema itself. Putting examples in the schema could really make it grow to huge sizes very fast. Stripping them out would tend to keep the documentation fairly clean inside the schema.

Re: Re: Producing the CNXML Specification from the Schema

Posted by cbearden at 2005-11-14 14:15
Thanks for the feedback. One issue I didn't make clear in the original posting is that under the present design of our RNG schema there is no single file for any version of our schema. I think I'll make another brief posting to that effect later.

One motiviation for having the examples in a separate doc is to make it easier for the examples folks to work at the same time as the schema folks (me).

One point in favor of having them in the schema is that we can easily extract a lightly (or not-at-all) documented schema from the omnibus file. It's still worth thinking about.

Re: Producing the CNXML Specification from the Schema

Posted by maxwell at 2005-11-11 13:39
I don't have much of a preference for how it's implemented (probably especially because I don't foresee myself writing much of this ;-)) but I do know that something that ought to be taken into account, if I'm correctly assuming that my EIP Help files will be generated from this as well, is that I had to make all of my examples less than a certain width, in order for them to fit nicely into the pop-up window. I also tried not to make them very complicated. So wherever the examples, are located, just remember that there at least needs to one "small" example for use in the EIP Help pop-ups.

Re: Re: Producing the CNXML Specification from the Schema

Posted by cbearden at 2005-11-14 14:19
Thanks for reminding me about the need to derive the EIP help files from the schema as well. You raise an issue that merits its own blog entry: how and by whom are the text of and examples for the help docs produced?

I suspect that there would be some way (class attributes on the CNXML tags used for documentation) that could distinguish spec text from EIP help file text.

Re: Producing the CNXML Specification from the Schema

Posted by mhusband at 2005-11-11 16:21
I think something like 3b or 3c would be the most practical, with a couple of questions/considerations. Like, who is going to enter the spec comment text into the schema - a developer or a writer? One person or several different people? Who ever enters the spec comments should be careful to be consistent in the way they word things and in the level of detail they use in the comments. If the spec comments are all over the place in wording and the level of detail is huge on one item and almost zero on another, then the spec will need a lot of polishing after the comments are lifted out of the schema. If we are going to automate the spec production to save time, then let's minimize the polishing work to be done on the spec after it is auto generated from the schema. Also, Kyle said he would prefer to not put the examples in the schema, so it does not get really big. Does the size of the schema affect the processing time or does it just look like bad developer practices?

Re: Re: Producing the CNXML Specification from the Schema

Posted by cbearden at 2005-11-14 14:27
Good questions, Mark.

"Who is going to enter the spec comment text into the schema - a developer or a writer? One person or several different people?"

If the examples live in the schema file(s), then I am thinking probably those directly responsible for the maintenance of the language (moi at this point) will enter all the text, regardless of who writes it. Whoever enters it will have to be comfortable authoring CNXML with a text editor, unless we choose a different way to edit the schema.

Your point about the need for consistency in wording is a good one, but it will apply more to the author than the person doing the input. I plan on submitting my spec draft to the whole spec team with the express purpose of getting feedback. And I'm looking to you to cast a critical eye on precisely the kinds of issues you raise.

The text will be/is being authored outside of the schema. I'm writing it as an OO doc at this point. Only when we are satisfied with its wording will I begin stuffing it into the schema. From that point on, when you or someone else identifies something in the spec that needs to be changed, the maintainer can visit that part of the schema/spec, make the changes, and run the spec generation system again.

The size of the non-schema parts of the schema (i.e. all the documentation) is irrelevant to the speed at which docs can be validated with that schema. All the non-RelaxNG stuff is discarded from the schema before it is compiled into the form used for validation. Kyle's concern (and mine) is that having too much non-schema stuff stuffed into the schema makes it less readable. But I'm still thinking that over.
Developer Blog
« November 2008 »
Su Mo Tu We Th Fr Sa
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            
2008-11-10
13:39-13:39 Suggestion for live site slowness reports
Categories:
Content (55)
Copyright (0)
Deep Code (3)
Development (203)
Markup (22)
Metadata (1)
Printing (7)
Style (9)
Testing (2)
Usability (6)