Performance tests of module PDF generation
New module print code was recently rolled out. Unfortunately, there are performance issues. Determining exactly what those issues are was the point of these tests. The new code represents significantly improved print functionality, but it was known to be slower. How much slower, we didn't really know.
Discussion of the kinds of changes in the new code lead to a question about the support for "implicit override EPS retrieval" i.e. the feature that uploading an image named "foobar.eps" beside one named "foobar.png" into the module would silently use the eps for the print job in place of the png.
So, the test plan was to take the top 50-100 most popular modules, and generate the PDFs, using the commandline make syntax. A tarball of the entire working directory is here
First get the list of modules, as so (zopectl debug):
stats = app.plone.content.portal_hitcount.getDailyAverages(True)
f=open('modpop','w')
f.write('\n'.join([s[0] for s in stats if s[0].startswith('m')]))
f.close()
That builds this file: modpop
Then, collect build times via wget. We ended up using the following simple script, redirecting stdout to a file:
#!/bin/sh
for m in $(head -40 modpop); do
wget -O $m.pdf "http://localhost:8080/content/$m/latest/?format=pdf"
done
This takes a bit of post processing. Each wget output file (named foo.dat) is run through:
grep '[0-2][0-9]:[0-5][0-9]:[0-5][0-9]' noeps-1.dat >noeps-times.dat
Which is then processed with time_date.py:
time_delta.py noeps-times.dat >noeps-delta.dat
Some of those are repeats, and averageable. To do that, I use join and awk:
join noeps-delta.dat RPLT-0.12.1-2-delta.dat | awk '{print $1,($2+$3)/2)' > RPLT-0.12-avg-delta.dat
Combine 'em all for plotting:
join RPLT-0.11-avg-delta.dat RPLT-0.12-avg-delta.dat | join - witheps-delta.dat >deltas.dat
plot via gnuplot:
set style histogram clustered gap 1 set xtics rotate by 270 set style fill solid 1.00 border -1 set key left set ylabel 'PDF generation/retrieval time (s)' set title 'Module Print/PDF generation' plot 'deltas.dat' using 2:xtic(1) title 'Old', '' u 3 t 'Unified: w/ EPS', '' u 4 t 'Unified: w/o EPS'
Take a look. For posterity, then do:
set term svg size 800 600 set output 'module-print.svg' plot 'deltas.dat' using 2:xtic(1) title 'Old', '' u 3 t 'Unified: w/ EPS', '' u 4 t 'Unified: w/o EPS' set output set term xwt
Load the svg up in inkscape for printing and png export. svg here
Noticed the large cost to EPS retrieval on some modules. Chuck generated a list of # of images per module, so calculate EPS cost and plot it:
head -n 40 modpop | cat -n | awk '{print $2,$1}' | sort | join - image-counts-srt.csv | sort -n -k2n | awk '{print $1,$3}' > image-counts-40.dat
join deltas.dat image-counts-40.dat | awk '{print $1,$3-$2,($3-$2)/$4}' >eps-cost.dat
Back in gnuplot:
set ylabel 'speculative EPS retrieval cost (s)' plot 'eps-cost.dat' using 2:xtic(1) t 'total EPS retrieval time', '' u 3 with line t 'time per image' lc 3
Take a look. For posterity, then do:
set term svg size 800 600 set output 'eps-cost.svg' replot set outout set term wxt
Again with the inkscape. svg here
