Skip to content

Rhaptos Software Development

Personal tools
You are here: Home » Developer Blog » Chuck's CnxBlog » Relax NG wild-cards and <extensions>

Relax NG wild-cards and <extensions> Relax NG wild-cards and <extensions>

Document Actions
Submitted by cbearden. on 2009-07-20 12:08. DevelopmentMarkup
A description of the problem we have run into trying to implement the <extensions> element.

Relax NG is such a flexible XML schema language that I'm always brougth up a little short when I do run into one of its limitations. At Friday's scrum (7/17), I tried to describe the issue that makes implementing our intended <extensions> element problematic, but I didn't succeed very well. My goal is to set forth the issue in this blog entry.

During the CollXML design work, we decided we wanted an element <extensions> that would accept any well-formed XML. This XML would be stored bug ignored by our system, but it would give authors a place to put data they use in their own systems. I knew that Relax NG permitted the definition of elements and attributes with "wild-card" patterns, so this task seemed eminently doable. Yet when I tried to validate the test docs with my modified schema, I got errors like this one:

  schema/collxml-jing.rng:637:14: error: conflicting ID-types
  for attribute "xref" of element "matrix" from namespace
  "http://www.w3.org/1998/Math/MathML"
  

Note that the schema, not the CollXML doc, is given as the locus of the error. It turns out that it's difficult to make jing's strict implementation of DTD ID and IDREF integrity checking play well with Relax NG wild card element definitions.

Consider a standard Relax NG "any element" wild-card pattern, in particular, the pattern defining the attributes:

  <define name="anyElement">
    <element>
      <anyName/>
      <zeroOrMore>
	<choice>
	  <attribute>
	    <anyName/>
	  </attribute>
	  <text/>
	  <ref name="anyElement"/>
	</choice>
      </zeroOrMore>
    </element>
  </define>
  

This pattern entails that any element can have an attribute with any name and any type. So an element <math> could have an attribute @id of the DTD 'ID' type, or of text type, or of any other type specified by the data type library in whose scope the wild-card definition falls. However, the DTD ID/IDREF checking rules specify that when an element is defined with an attribute of the 'ID' type, all elements by that name having an attribute of the same name as the one defined as of 'ID' type must define that attribute as of type 'ID'. And it so happens that MathML defines @id as being of type 'ID'. Because the wild-card pattern matches elements <math> with an @id of type 'text' just as well as <math> with an @id of type 'ID', it is inconsistent with DTD-compatible ID/IDREF checking, and jing reports this conflict.

You might think you could create two wild-card patterns, one for any element except those in the MathML namespace, and one for any element in the MathML namespace, and then specialize the attributes definition in the MathML wild-card pattern to stipulate that its @id attributes are always of type 'ID', while leaving the more general wild-card pattern open. This doesn't work, however, because the DTD rules were written before the days of namespaces, and so DTD ID/IDREF checking is blind to namespace prefixes and declarations.

One option is to identify all attributes in our languages that are of type 'ID' or 'IDREF', and stipulate in the wild-card pattern that attributes by those names will always have the 'ID' or 'IDREF' type. However, that would mean that any XML pasted into the <extensions> element by an author would be subject to the same restriction, and the IDs in their content would have to be unique within the whole CollXML document. It would also run into problems with MDML >= 0.4.5, since author/@id is actually a userid, and is by no means guaranteed to be unique within a document (in fact, authors are almost always also licensors and maintainers, so the likelihood of duplication of values of these @ids is quite high).

Another option is to disable DTD ID/IDREF checking when validating with jing and to check ID integrity in some other way. James Clark suggests using Schematron for ID/IDREF checking in his blog entry cited below. I think this approach is viable in the long run, but I don't think we should take the time now to implement it.

So the likely upshot is that we punt for now on our <extensions> element.

Addendum

In Friday's scrum, Philip brought up the MathML <annotations-xml> element, which in the W3C rec is shown in an example as containing OpenMath mark-up. The hope was that this example meant that <annotations-xml> would accept any well-formed XML, and that the pattern for this in the MathML RNG schema would show us the way to do the same in CollXML. It turns out that the MathML RNG schema accepts only MathML elements in that spot.