Meeting to discuss (La)TeX and Connexions Printing
Rich organized a meeting yesterday with Don Johnson, Dave Johnson (from CS), Sidney, Kathi, and me, to discuss LATEX, TEX, and the future of Connexions printing. Both Don and Dave are known to be LATEX wizards, and I was eager to get their feedback about the lines along which I've been thinking about PDF generation and printing, as well as their hints about solving a few specific problems. Rich acted as emcee, which was perfect since he knows a lot about both TEX and Connexions.
A little bit of background.
I have suggested at times that we consider using XSL-FO either alongside of or instead of TEX/LATEX to build our PDFs. As you may know, I had a difficult time bending our present LATEX-based system to my will to make the Ballon/Westermann report look right, and I decided to jump off that train and onto the XSL-FO Express, both for the purpose of completing that project and of finding out about XSL-FO first-hand.
The result was that I was able to create a PDF generation system that handled the Ballon/Westermann book to the satisfaction of the authors, starting from scratch (well, I reused the first step of our legacy print system, the one that retrieves all the CNXML files and puts them into one file), within two weeks, working on the problem part-time. I needed very little by way of documentation (the W3C recommendation and a couple of tutorials [1][2]), so it was clear that XSL-FO is vastly more transparent to the beginner than (LA)TEX. Because the penultimate stage in the process is XML, it is easier to move stuff around without messing up nesting than in LATEX.
Even then it was clear that XSL-FO lacked the formatting granularity and hence the power of LATEX, and it doesn't in its present version make provision for handling math content. Meanwhile, I learned at XML 2006 that TEX can format both music and representations of chemical structures as well as math. I also realized that my frustrations with (LA)TEX were due largely to the fact that I lacked an understanding of how LATEX macro packages bundle and modify TEX behaviors. While I'm still persuaded that a person starting from scratch can learn XSL-FO well enough to realize the Ballon/Westermann book in 1/2 or even 1/3 the time it would take to learn LATEX that well, I also think that TEX and LATEX in their present versions have features we will need for some of our projects but that XSL-FO in its present version lacks.
My list of considerations.
General constraints
- Need to support academic press publishing in any discipline in ways compatible with the publishing conventions of that discipline. We are still in what Sidney calls "Phase I" (ask him about it).
- e.g. Art History
- Need to support sufficient features of textbook layout and publishing to make ourselves viable candidates as partners in such projects
- Need to support editing at pre-PDF stage, ideally by non-technical people
- At this point I am persuaded that for textbook and academic press work, there will be things that can't be handled automatically, and that can't be fixed until a PDF with its pagination is generated.
Classes of problem
- Image widths and scaling (will probably always need to be fixed by hand)
- Table widths (will probably always need to be fixed by hand)
- Equation breaking (will probably always need to be fixed by hand)
- Image/figure positioning (will probably always need to be fixed by hand)
- Numbering (do it either all in the pre-TEX stages, or do it all in TEX; right now some is done in XSLT and some in LATEX)
- (LA)TEX errors leading to failure of PDF build (maybe due to our bad code)
Some thoughts
- Which do we use: TEX, XSL-FO, or both?
- Can we do math well in XSL-FO and without undue contortions? How about music or chemistry?
- At this point, MathML -> PDF via XSL-FO will be contorted, I think; probably some conversion to SVG, which could make fixing long equations difficult or impossible (see this email exchange). I suspect that the same will apply to MusicXML and C(hemical)ML.
- TEX for math content, XSL-FO for other content?
- TEX for math content, XSL-FO for other content?
- Can we do math well in XSL-FO and without undue contortions? How about music or chemistry?
- TEX does math beautifully, and is reputed to do music and chemistry beautifully as well.
- XSL-FO is very easy to work with:
- It's XML and so is easily parsable and manipulable.
- Its syntactic constraints are helpful in finding problems.
- Think TEX, not just LATEX.
- If we go with (LA)TEX, we need the flexibility to create our own macro packages with new environments if need be; (LA)TEX needs to bend to our will, rather than us adapting to the limitations of existing macro packages.
- What about pre-PDF editing environments for either system?
- editing XSL-FO or TEX files before PDF generation is preferable to using PDF editing software, because there is some hope that pre-publication changes can be captured and reused in subsequent versions (diff and patch, perhaps); not the case with PageMaker/Quark, I think.
The upshot.
Don and Dave are willing to give time to help us with LATEX problems. They loaned us some books they consider especially useful. I also learned how to solve the "space after the less-than sign" problem that afflicts the display of XML code in our module PDFs.
It's evident that (LA)TEX has the edge over XSL-FO 1.1 when it comes to ultimate typesetting power and integrated handling of math, music, and chemistry. It is also more difficult to master than XSL-FO; in fact, at this point I think that the power:difficulty ratio still favors XSL-FO, until we add MathML to the equation [sic]. Of course, MathML and math are a crucial part of the equation for us, and music will be as well if we can help our authors to create MusicXML and include it in their content.
In my estimation, there is greater hope for an XSL-FO pre-PDF editing environment usable by the non-geek to make final changes before print than there is for the same thing with LATEX.
Creating a PDF generation system that satisfies the constraints enumerated above will require a solid grasp of TEX and LATEX fundamentals on the part of whoever designs and implements our next-generation PDF-generation system (likely to be me at least in part).
For now, we continue with (LA)TEX, but with a view to learning enough about it to bend it to our will. We will rely both on books that will give us a good grasp of LATEX fundamentals, and on the assistance of our LATEX wizards Don and Dave. From my perspective, the steps forward look something like this:
- Complete initial changes to information architecture (at least CNXML 0.6, MDML 0.4, but not necessarily the collection structure).
- Test a broad sample (25+) of collections in the present print system.
- Classify the problems that occur:
- problems that can be solved without manual editing of TEX, either by fixing the transformations or by improving the markup to represent rendering properties
- problems that require manual editing of TEX before PDF generation, and that can't be solved by improving the transformations or source formats
- failures to generate PDF (e.g. TEX syntax problems that we can't forestall completely)
- others?
- Modify the information architecture (CNXML, collection structure) to accomodate as much information about rendering parameters as is feasible.
- Modify the transforms to act on rendering information.
