Skip to content

Rhaptos Software Development

Personal tools
You are here: Home » Developer Blog » Ross's Bits » Performance tests of module PDF generation

Performance tests of module PDF generation Performance tests of module PDF generation

Document Actions
Submitted by reedstrm. on 2008-07-02 12:53.
Chuck and I ran a few 'quick' tests of module PDF performance, using an amazon EC2 instance. Here's what we determined.

New module print code was recently rolled out. Unfortunately, there are performance issues. Determining exactly what those issues are was the point of these tests. The new code represents significantly improved print functionality, but it was known to be slower. How much slower, we didn't really know.

Discussion of the kinds of changes in the new code lead to a question about the support for "implicit override EPS retrieval" i.e. the feature that uploading an image named "foobar.eps" beside one named "foobar.png" into the module would silently use the eps for the print job in place of the png.

So, the test plan was to take the top 50-100 most popular modules, and generate the PDFs, using the commandline make syntax. A tarball of the entire working directory is here

First get the list of modules, as so (zopectl debug):

  stats = app.plone.content.portal_hitcount.getDailyAverages(True)
  f=open('modpop','w')
  f.write('\n'.join([s[0] for s in stats if s[0].startswith('m')]))
  f.close()

That builds this file: modpop

Then, collect build times via wget. We ended up using the following simple script, redirecting stdout to a file:

  #!/bin/sh
  for m in $(head -40 modpop); do
    wget -O $m.pdf "http://localhost:8080/content/$m/latest/?format=pdf"
  done 

This takes a bit of post processing. Each wget output file (named foo.dat) is run through:

  grep '[0-2][0-9]:[0-5][0-9]:[0-5][0-9]' noeps-1.dat >noeps-times.dat

Which is then processed with time_date.py:

  time_delta.py noeps-times.dat >noeps-delta.dat

Some of those are repeats, and averageable. To do that, I use join and awk:

  join noeps-delta.dat RPLT-0.12.1-2-delta.dat | awk '{print $1,($2+$3)/2)' > RPLT-0.12-avg-delta.dat

Combine 'em all for plotting:

  join RPLT-0.11-avg-delta.dat RPLT-0.12-avg-delta.dat | join - witheps-delta.dat >deltas.dat

plot via gnuplot:

  set style histogram clustered gap 1
  set xtics rotate by 270
  set style fill  solid 1.00 border -1
  set key left
  set ylabel 'PDF generation/retrieval time (s)'
  set title 'Module Print/PDF generation'
  plot 'deltas.dat' using 2:xtic(1) title 'Old', '' u 3 t 'Unified: w/ EPS', '' u 4 t 'Unified: w/o EPS'

Take a look. For posterity, then do:

  set term svg size 800 600
  set output 'module-print.svg'
  plot 'deltas.dat' using 2:xtic(1) title 'Old', '' u 3 t 'Unified: w/ EPS', '' u 4 t 'Unified: w/o EPS'
  set output
  set term xwt

Load the svg up in inkscape for printing and png export. svg here

module print times

Noticed the large cost to EPS retrieval on some modules. Chuck generated a list of # of images per module, so calculate EPS cost and plot it:

  head -n 40 modpop | cat -n | awk '{print $2,$1}' | sort | join - image-counts-srt.csv  | sort -n -k2n | awk '{print $1,$3}' > image-counts-40.dat
  join deltas.dat image-counts-40.dat | awk '{print $1,$3-$2,($3-$2)/$4}' >eps-cost.dat

Back in gnuplot:

  set ylabel 'speculative EPS retrieval cost (s)'
  plot 'eps-cost.dat' using 2:xtic(1) t 'total EPS retrieval time', '' u 3 with line t 'time per image' lc 3

Take a look. For posterity, then do:

  set term svg size 800 600
  set output 'eps-cost.svg'
  replot
  set outout
  set term wxt

Again with the inkscape. svg here

EPS cost per image

Re: Performance tests of module PDF generation

Posted by kef at 2008-07-15 11:50
How can Unified w EPS take less time than Unified w/o EPS?