Skip to content

Rhaptos Software Development

Personal tools
You are here: Home » Developer Blog » Elchin's Blog » LaTeXML

LaTeXML LaTeXML

Document Actions
Submitted by easgarov. on 2007-08-14 12:50. Development
LaTeXML is the tool that converts from latex to xhtml.

LaTeXML is the tool that transforms latex to xhtml+mathml or html+images. Essentially it consists of two tools, latexml which transforms latex to its own internal xml format with own internal math representation, and then latexmlpost transforms it to either xhtml+mathml or html+images. But as latexml I would be refering to both of those utilities used together, if not stated otherwise.
The simple usage of latexml is:
latexml --dest=mydoc.xml mydoc
latexmlpost -dest=somewhere/mydoc.xhtml mydoc.xml
This will carry out a default transformation into XHTML+MathML. If you give the destination extension with html(-dest=somewhere/mydoc.html instead of -dest=somewhere/mydoc.xhtml), it will generate HTML+images.

Website of utility is http://dlmf.nist.gov/LaTeXML/
And manual is located at http://dlmf.nist.gov/LaTeXML/manual/

Installing was simple, Ross installed everything :D
I also had in installed on my laptop at home, and it worked properly. I had SuSe, and I installed all packaged from YAST, and perl libraries from CPAN. The following command should be sufficient to install the required perl modules (typically as root):
perl -MCPAN -e shell
install Parse::RecDescent, XML::LibXML, XML::LibXSLT /* any name of package needed to be installed*/
After that you need to perform standard unix installation:
tar zxvf latexml-#.#.#.tar.gz
cd latexml-#.#.#
perl Makefile.PL
make
make test
and then, as root:
make install

The tool is written in perl, as is the installtion. So I guess perl knowledge would be a plus in dealing with this utility.

Here are some notable facts about this utility:

http://dlmf.nist.gov/LaTeXML/manual/usage/single.html
XSLT is applied in order to get final xhtml, so essential idea is that we can adapt this xslt for cnxml.

http://dlmf.nist.gov/LaTeXML/manual/architecture/construction.html:
"A LaTeXML::Model is maintained througout the digestion phase which accumulates any document model declarations in particular the document type (currently only the DTD, but eventually may be RelaxNG based). As LaTeX markup is more like HTML than XML, declarations may be used to indicate which elements may be automatically opened or closed when needed to build a document tree that matches the document type. As an example, a <subsection> will automaticall be closed when a <section> is begun."

Usage notes:

It smoothly run on simple latex files that had math and images. For Ray's latex (cdc06) it run with hacing error in generated internal xml, which was having same id's. But it is simple to remove it, by just removing id's.

Another usual error it gave at some documents was that its internal xml format had <ERROR> tags with #PCDATA inside, and according to DTD it had to be #CDATA (or other way around, I dont remember exactly). Simply removing those <ERROR> tags made everything run properly, so while xsl transforming to cnxml, we can just remove them as preprocessing stage. Reason for this error in my opition is that author may not handle errors, but it just tells that this is error, do whatever you want, by enclosing error to <ERROR> tag.

I was also looking into latexml's internal structure, and I figured our that it uses xsl files to transform from its internal xml to xhtml or html. So I think we also could write xsl file that would transform from its internal xml to cnxml. I believe that we can also chop off the code that transforms its internal math representation to mathml. Another way could be modifying xhtml importer made for scientific workplace to handle xhtml generated from latexml. I actually did this for lecture3 of CS Devore lectures, and it gave me quite good results. I first converted that lecture to xhtml+presentation_mathml using latexml and then imported it with xhtml importer. It gave quite some errors, which were mainly because there were come texts and math's that were not enclosed in tags. Enclosing them with <para> tag, made everything work correctly, and it had much less errors than latex imported with oolatex tool from tex4ht package.

LaTeXML is currently installed on suntzu, and it is in /usr/local/tarballs/LaTeXML-0.5.9
DTD's are in lib/LaTeXML/dtd subfolder
in lib/LaTeXML/Post subfolder I found MathML.pm file, and this is the file that deals with transform of internal math format to presentation mathml. So I guess we can definetely use it.
in lib/LaTeXML/dtd subfolder there are also xls files that are used to generate xhtml and html. Those are core.xsl.tail, html.xsl.head and xhtml.xsl.head . core.xsl.tail is used in both xhtml and html transform, and it is merged with html.xsl.head or xhtml.xsl.head depending on whether you transform to html or xhtml respectively.

Another notable thing is that it uses m: namespace for mathml, and also you can give it option to trandform to content mathml instead of presentation mathml. Here is documentation page for latexmlpost's math options:
http://dlmf.nist.gov/LaTeXML/manual/commands/latexmlpost.html#SSx2.SSSx4
Here is page for details about math in latexml: http://dlmf.nist.gov/LaTeXML/manual/math/details.html

This tool is also well-documented, all perl packages have perldoc documentation. Usage of perldoc is:
perldoc /*name of package*/
example:
perldoc LaTeXML::Box
List of available packages are modules are here:
http://dlmf.nist.gov/LaTeXML/manual/coremodules/
You can also browse same perl documentation from there.

For style files that you include in latex, latexml has files with extension .ltxml, which is used to handle those packages. Those files you can find in lib/LaTeXML/Package subfolder. Here is the page from latexml documentation dedicated to package handling:
http://dlmf.nist.gov/LaTeXML/manual/customization/