LaTeXML notes
Installing latexml from notes here : http://dlmf.nist.gov/LaTeXML/
implicit install steps:
sudo apt-get install imagemagick
sudo apt-get install perlmagick
sudo apt-get install libxslt1-dev
sudo apt-get install libberkeleydb-perl
sudo apt-get install libdb4.4
sudo apt-get install libdb4.4-dev
explicit steps from the web site:
perl -MCPAN -e shell
% install Parse::RecDescent
etc.
Using LaTeXML
latexml --dest=mydoc.xml mydoc
latexmlpost -dest=somewhere/mydoc.xhtml mydoc.xml
I tested 11 files from Ron Devore, 1 from Ray and 1 from Paul Pfeiffer. None of the 13 went through all of the way, without requiring hand modification.
2 failed in the first step: 1 from Ron Devore (deep recursion) and 1 from Pfeiffer (fatal error).
The other 11 failed in the second step:
11 files had <para> tags with duplicate id attributes.
10 files had <text> nodes with pos attributes.
3 files has <ERROR> tag with non-character children.
1 file has a <caption> tag outside of its <table> parent, which yields a <subsection> error.
Better said, the step 2 failures were really step 1 failures (generating invalid XML which causes the XSLT processing to choke) recognized after the fact.
Observations:
latexml and latexmlpost both peg the CPU.
Problems that cause latexmlpost errors can be hand edited away. The resulting .xhtml file is a fairly good translation.
