Harvesting Imported Word Files
Submitted by
bnwest.
on 2006-11-15 14:15.
Mission - Harvest imported Word doc files in order to be able to profile the imported Word usage. Track both the successes and the failures of the Word importer. Harvesting can be turned on/off on the server dynamically, starting out (default) as off. Build a test bed of real world import Word docs, for regression testing.
Design objectives as discussed with Ross:
- all imported Word docs will be saved to a central location
- word docs are be saved in either the GOOD or BAD directories therein
- we will create the saved file name from the user name and the original file name
- we will have more log messages which include the user name, the original word file name, conversion status, save file name, etc.
- user importing the same file again may over-write a previously saved copy (policy decision)
- failure to write the saved doc will trigger the old tempfile logic. the side effect will be that some imported word files will be lost if the file system is full (policy decision)
- space management for the central location (policy decision) is TBD
- GOOD doc files will be harvested to build a test bed for the importer
- BAD doc files will be analyzed and authors potentially contacted "proactively"
- the configure attribute in the oo_to_cnxml object/class will have two entries for the GOOD and BAD harvest directories.
- initial values for the two entries is null => by default harvesting is turned off.
- the zope/plone admin control panel can change these two values which turns harvesting on/off

Instance variables are set at install time and NOT at restart time. This required that an upgrade scripte b created to be run as part of roll-out.
See: CNXMLTransforms/Upgrades/up031to04.zctl