Skip to content

Rhaptos Software Development

Personal tools
You are here: Home » Developer Blog » Jenn's DevBlog » Doctype data cleanup

Doctype data cleanup Doctype data cleanup

Document Actions
Submitted by jenn. on 2005-05-06 10:10. DevelopmentMaintenance
A few weeks (?) ago, we finally tidied up the doctype data in the database.
The doctypes in the database have historically been unreliable for several reasons. One major one is that we didn't store that string (e.g., "-//CNX//DTD CNXML 0.5 plus MathML plus QML//EN") in the database until shortly after we released CNXML 0.3.5 and updated all the modules to that version. Unfortunately, all the previous module versions in 0.1, 0.2, and 0.3 got 0.3.5 doctype strings in the database as the default.

There was also a period of time (okay, most of the last couple of years) during which changing the doctype of a module in the CNXML source (i.e., updating its CNXML version) didn't cause the new doctype to propagate to the database. That's now fixed as well.

So, with the reasons for the sync problems gone, I wrote a script to slurp the actual valid doctypes out of the cnxml for each module and all its previous versions. Then we wrote an update to the database for all the current and past modules, and voila! The data is clean again, and should remain that way.

Why do we care? The lack of syncing factors occasionally caused some modules to report one doctype in their source and another on their metadata pages. And that's just untidy. In general, everything that really *needed* to know the doctype was looking in the right place (the source), so nothing actually broke. But since we consider the database to be the canonical source for all the other metadata, it's good that we're now consistent.

Edit, 5/10: Chuck has discovered that some of the public doctype IDs (example above) don't match the system IDs (which is, in our case, the URL of the DTD). That's a disagreement within the CNXML document, as opposed to between the document and the database, so it's possible that the database is still not quite correct. So instead of saying "the doctypes are all tidy now", I'll amend that and say "the database doctypes match the PUBLIC doctype string in the source, which potentially needs tweaking in about a hundred current module versions".