Collection Printing Documentation
We have a command-line system that we use for building collection/course PDFs of Connexions content. It's not part of the Zope/Plone system, so it is not available as a product. Do you have access to our SVN repository? In any case, you can download a tarball from here:
http://www.owlnet.rice.edu/~cbearden/vef/collection_printing.tar.gz
In these instructions, I will use the word collection to refer to
what we sometimes call a course and sometimes a collection. Since
these objects aren't always really a course, we are trying to use the
more general term now.
Dependencies:
- GNU make
- pdfetex/pdflatex (in my distro it's part of the tetex-bin package)
- ImageMagick (for the convert program)
- gif2png (if your Linux distro doesn't have it, you can also get
it at http://catb.org/~esr/gif2png/)
The tarball will contain two directories: printing and scripts. From
scripts you will need only imagefix (not imagefixer.py) and
replace.py. Put them somewhere in your executable path. These are
two helper programs used by the main PDF generation system.
The PDF generation system is in the printing directory. The
makefile is course_print.mak. You'll need to edit the PRINT_DIR
path at the top of the file to point to the final location of
printing.
You will also need one directory for each collection for which you want to build a PDF, to contain the workfiles generated in that process. Typically what I do is have a directory like
/home/cbearden/collection_printing
and make a printing subdir (and later a subdir for each collection)
/home/cbearden/collection_printing/printing
The printing subdir contains what's in the printing subdir of the
tarball.
Once you have the directory surcture in place, do the following to generate a PDF of a collection:
- Create a directory for the collection you want to print. I
always name them by the collection ID, so for Rich's big book I'd
call it
col10064. So the directory structure would look like this:/home/cbearden/collection_printing/printing /home/cbearden/collection_printing/col10064
- Either make a symlink from the collection PDF directory pointing
to the makefile in the printing directory, or copy the makefile
into the collection subdirectory. This is a convenience so that
you don't have to give the full path to the makefile when you
build the PDF.
/home/cbearden/collection_printing/printing/course_print.mak /home/cbearden/collection_printing/col10064/course_print.mak -> ../printing/course_print.mak
- Download the RDF description of the desired collection into the
directory for that collection. The RDF file is the first input
into the printing pipeline. For any URL pointing to a collection,
you can append the argument
?format=rdfto retrieve the RDF description of it. So for Rich's course athttp://
/content/col10064/latest/ you can get the RDF description at
http://
/content/col10064/latest/?format=rdf I usually use wget to to retrieve the RDF
wget -O col10064.rdf
http://<your repository address>/content/col10064/latest/?format=rdfbut you could also pull it up in a web browser and save it to the filesystem
- Run the
makecommand with the target as the PDF file with the same basename as the RDF file:make -f course_print.mak col10064.pdf
You can also build intermediate targets as well. One of the most
useful intermediate targets is the final LaTeX stage before PDF
generation, which in the case of our example would be col10064.tex.
This is the thing to do if you need to make any final edits to
correct problems in our pipeline, or to handle things like page breaks,
before building the PDF. The make process will always check to see
if any input files are newer than the target file, and if so, it will
start with them. So if you build a PDF that looks funny and you want
to correct it by editing the LaTeX (if for instance that there is no
alteration to the CNXML that would fix things), and you edit and save
the LaTeX file and run the make command again, it will start with the
LaTeX file since it is the only input file newer than the target file.
Sometimes the PDF build will fail catastrophically, leaving either a
broken or an empty PDF file. In this case you need to examine the
log file (col10064.log in our example case) to see what you can
figure out about what pdflatex didn't like about the LaTeX input file
. I'm not a LaTeX expert, but I can often figure out something about
the problem from this file.
As you look through the resulting PDF, you will sometimes see images
that are too large or too small. We have a way of dealing with this
problem, but it's rather ugly. In the directory containing the
printing workfiles, create a file with the collection ID as the
basename and an extension of .width. This file should contain
lines each of which has the basename of an image file, a space, and a
width for that file expressed in some measure that pdflatex
understands (I use inches, you would probably use cm). The entries
would look like this:
m10790_fig1a 6.5in m10790_fig1b 6.5in
If you omit a unit, the number is treated as a scaling factor:
m10757_mfilt_3 .75 m10764_fig3 .75
These numbers are simply stuffed into the LaTeX file by the
imagefix program, into the width argument of the
\includegraphics commands, e.g.:
\includegraphics[width=6.5in]{col10064/m10790_fig1a.png}
If there is no entry present in the width file, then the
\includegraphics is printed without the width argument, e.g.:
\includegraphics{col10064/m10790_fig1a.png}
So you can use any width specification in the width file that you
can use in the LaTeX \includegraphics command.
These instructions should be enough to get you started. I know that
there are many problem situations that they won't cover, so let me
know when you do have problems. I'm actively working on this code as
I also work on the PDFs we are generated, so if you can use our svn,
you can svn up to get any new changes. You could also create a
branch of the code in which you make your changes to handle
Vietnamese correctly, and merge those changes into our code after you
svn up.
The whole build process is rather ugly and ad hoc, and I know that some of the LaTeX constructions are bad, but we will improve this bit by bit. LaTeX and TeX handle math so well that it's hard to give up this process even if XSL-FO would be much easier in other respects.
By the way, you may find it helpful to look at the workfiles and PDFs of our collections that I have committed to our svn:
https://trac.rhaptos.org/trac/rhaptos/browser/printing_workfiles
You can see examples of width files, and how I use patches to
capture manual edits to the LaTeX to re-apply after code and content
updates.
