Skip to content

Rhaptos Software Development

Personal tools
You are here: Home » Developer Blog » Brian's Sooth » LaTeX Importer Workflow

LaTeX Importer Workflow LaTeX Importer Workflow

Document Actions
Submitted by bnwest. on 2007-09-24 15:36. Development
LaTeX Importer Workflow

The current LaTeX import process has the following steps:

  1. unzip the LaTeX source. all necessary files are located at the root of the zip.
  2. run latex on the LaTeX source to see if a .dvi file is generated.  makes sure we got all the files we need.
  3. run tralics on the latex file, which generates a tralics xml file.
  4. perform a xsl transform to convert the tralics xml file into a cnxml file.
  5. perform a second xsl transform to "tidy" the cnxml.
  6. validate the cnxml via xmllint.
  7. convert .eps files into .png files.
  8. create a zip file of the cnxml and the images.
  9. in cnx.org, import the zip file into an empty CNX module.

All of the steps are done from a  Linux command line with a KSH script.  Steps 1 and 8 require that a zip untility is installed. Steps 2 require latex has been installed. Step 3 require tralics has been installed. Steps 4, 5 and 6 require xsltproc and xmllint (unsure of the DOS equivalent) has been installed. Step 7 requires Imagik has been installed.

With the above process, we have seen about a 10% chance that the input LatTeX file will import successfully.  After making modification to the LaTeX file, the success rate becomes over 90%.  Most of the modifications that are made were centered around macros that tralics failed to handle correctly.  For example, we can trace back from the CNXML validation error to the LaTeX source and see if an offending macro could be found.

Some work could be done to improve the initial 10% success rate.  We could provide a LaTeX template file that authors with instructions could fill in.  Some authors may not want to do this though.  And I am unsure that using a LaTeX template file will raise the success rate to over 90%.

Even with a LaTeX template file, the author may still run into translation problems which require detective work on their part.  Some authors may not want to play detective, but we are talking about LaTeX authors here who spend hours to get figures to display on the right page.

Server Side Solution

From our existing import UI, we would accept a zip file containing a .tex file plus other required files (at the root of the zip).  We could then perform steps 1 to 7 on the server side. Instead creating a zip of the cnxml and image files, we could directly import them into the module.

The problem with a server side import is that there is only a 10% success rate.  We thus expect the authors to modify their tex source repeatedly until the import succeeds.  To make the change, we must provide enough information to the authors.  This in itself is its own intractable problem.  We log each step of the import and return to the author a zip file containing all of the byproducts of the import, which includes the log file.  From the log file, the author could determine where in the CNXML validation fails and map back to the LaTeX source file.  The author may then get an idea of what caused the import to fail.

Debugging the import process may be more work than the author signed up for.

Client Side Solution

We could provide a script file (BAT for Windows and KSH for Linux but no support for Mac?) that performed the translation process, like steps 1 through 8 above.  By explicitly running the script on the client side, the author would get quicker feedback viz-a-viz importing via cnx.org.

Window client side solution would require an install package. FWIW my preference for a Windows install package is NSIS.

Open Issues

We need to document the LaTeX packages that tralics natively supports.


Re: LaTeX Importer Workflow

Posted by easgarov at 2007-10-13 15:32
For Mac, there is AppleScript. But I think MacOS X should also run bash scripts: http://www.macdevcenter.com/pub/a/mac/2004/02/24/bash.html