Skip to content

Rhaptos Software Development

Personal tools
You are here: Home » Developer Blog » Brent's Blog » Pickling lambdas and other oddities

Pickling lambdas and other oddities Pickling lambdas and other oddities

Document Actions
Submitted by brentmh. on 2005-04-14 12:04. Deep CodeDevelopment
Off and on for several months I've been working on a new CMF tool to do XML validating. It's been *almost* there for a while now.

Background

The code for validating XML has needed an update for some time. One problem is that it calls out to the external program cnxmllint to do what could very well be done in-process without the overhead. (The fact that cnxmllint doesn't even compile against recent versions of libxml2 is another motivating factor to getting rid of it.) Another problem is that it's currently part of the CNXMLDocument Product when it really needs to be in a more general package.

The Plan

My long-range goal with this refactoring is to make something that has fewer (no) hardcoded dependencies on our setup. I also wanted an easier way to migrate from DTDs to RelaxNG. The tool I had in mind would allow you to register a validator for a specific combindation of namespaces as well as providing explict methods for validating against a DTD, XML Schema, or RelaxNG schema. This way you could keep the DOCTYPE (or schema, or whatever) declaration outside the document. The tool would scan the document for namespaces and trigger validation based on that

Aside: as Ross reminds me, a better approach in the future would be to simply register validators for individual namespaces and then "do the right thing" for compound documents. Of course, the hard part is figuring out what "do the right thing" means. Where are the different schemas allowed to intermix? This isn't just a problem for us either. The W3C has a workgroup for Compound Documents. This topic came up in discussions with Daniel and Laurent this week as well. I'm hoping that proposals like James Clark's Namespace Routing Language will point the way to a solution. In the meantime, we'll try to contain the combinatorial explosion

The Problems

My original idea was to allow anything callable to be registered as a validator. I very quickly learned that there are restrictions on what can and cannot be persisted in the ZODB because some thing simply cannot be stored in a python pickle. I also learned that spouses may look upon you strangely if you suddenly blurt out "Aha! You can't pickle a lambda!" while reading on the couch. Of course, lambda functions aren't the only unpicklable things. Much to my sadness you also can't pickle C-Extension objects. Fortunately there's a way around this as described in PEP 307.

I also ran into limitations in the python bindings to libxml2. This led to me contributing patches for various memory leaks and enhancements in order to register python validation callbacks. It also led to my seriously looking at lxml as an alternative python binding. Unfortunately right now lxml's validation support is somewhat weak. It's a project to keep an eye on though, and perhaps I'll have time to contribute some code one of these days.

The Future

The tool works well now, and validates quickly. It should be a speed-boost for us since it keeps the parsed DTDs around instead of reloading them every time someone validates anything. The only remaining issue to solve now is that some modules don't have the metadata namespace declared causing them not to parse correctly. I think I can work around this in a temporary way by stuffing said namepace into the document at checkout time. The long-term solution is to move forward with our plans to remove the metadata section from the content and store it separately. Perhaps that's something to do first.

Beyond validation, I'd like to move our stylesheet transformations to the tool as well. This another piece with hardcoded paths that needs to be fixed up before a public release. XSL transformations in Zope are getting some attention these days from Paul, Kapil, Ben, and others. It seems that there is some interest in taking a pipeline approach similar to what we're doing. Something to keep an eye on, to be sure.

Effects on validation

Posted by jenn at 2005-04-14 12:17
> The long-term solution is to move forward with our plans to remove
> the metadata section from the content and store it separately.

Yay! Rah! Cheer! Encourage! :)

As a relative outsider to validation issues, my first question is, what does this do to the invalidity messages? Do they get clearer or more obscure, or do they not change?

Re: Effects on validation

Posted by brentmh at 2005-04-14 17:54
It will have an effect on validation messages, but probably not the one you're looking for. The text shouldn't be any different, but we'll no longer have the context that we used to. Instead we'll link to the source view where we can highlight the affected line.

metadata section: downsides, upsides?

Posted by cbearden at 2005-04-15 16:22
If I recall correctly, the chief (only?) downside to having the metadata section inside the document is the difficulty of keeping the metadata in the doc in sync with the metadata in Postgres. Is that fair? Or does the metadata section cause other headaches?

I have a concern with separating the metadata out of the doc. Ostensibly, we want people to be able to reuse the XML-encoded Connexions content for purposes we haven't foreseen. I assume (incorrectly, perhaps) that we want people to be able to download the XML version of a course and reuse it in some way or in some context that we haven't foreseen. That openness seems implicit in the CC model. But if they download the XML without the metadata section, it seems to me that they loose a lot of important information, including some information pertaining to attribution and version, as well as the abstract. In a sense, a 'content' element without the metadata is headless.

What if we made the metadata section optional, and shoveled it in from the database at export? That would solve the problem of the ongoing sync'ing of the xml with the rdbms, since the metadata joins the doc again only at export time.

Am I being too persnickety about this? Will I be excommunicated as a heretic :-)

Chuck

Re: metadata section: downsides, upsides?

Posted by brentmh at 2005-04-18 23:02
No, you're merely forcing me to articulate myself better :) Storing the metadata separately is something we've talked about for so long that I forget the idea isn't really fully explained anywhere. I'll put it in my next blog entry.
Developer Blog
« August 2008 »
Su Mo Tu We Th Fr Sa
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            
2008-08-14
15:42-15:42 Signing on to Jabber from multiple locations
Categories:
Content (55)
Copyright (0)
Deep Code (3)
Development (200)
Markup (22)
Metadata (1)
Printing (7)
Style (9)
Testing (2)
Usability (6)