Skip to content

Rhaptos Software Development

Personal tools
You are here: Home » Documentation » Architecture » Component Design » Storage Dispatch Notes from Brent

Storage Dispatch Notes from Brent

Document Actions
Design considerations for Storage Dispatch recently emailed by Brent.
OK, it's a little late, but here are some notes on where I was going
with the storage-dispatch branch.  Unfortunately, I haven't been able
to find anything extant beyond a tiny blurb in the OMDoc plans
(http://rhaptos.org/Members/brentmh/plans/omdoc/) so I'm going to have
to go by memory and what's in subversion.

Goals
- A cleaner way of handling different storage implementations (Death
to objectId.startswith('m'))
- More extensibility for new data types (OMDoc, BRIT stuff, etc.)
- Better compatibility with other Zope versioning solutions

The first two were really the primary goals.  The latter one kind of
snuck in (scope creep!) because I felt it would be a good thing to do
and this seemed to be a reasonable time to do it.  In retrospect it
probably should have waited.

Design Summary
----------------------
The idea is to have multiple storage backends that would each
implement a different mechanism (ZODB vs. CVS+Postgres vs. whatever).
When an object is first placed in the repository, its portal_type is
used to select a storage backend and the mapping between objectId and
storage backend is stored.  Subsequent retrievals can be done via
objectId without knowing the portal_type.

Repository Design Notes
-------------------------------
With the mechanics of the different storages pushed into the backends,
the Repository itself has 3 jobs:
 * Manage the various storage implementations
 * Manage the mapping between objectId and backend
 * Dispatch storage/retrieval operations to the correct backend(s)

StorageManager
----------------------
The IStorageManager interface describes the API for managing the
various storage implementations.  You can find IStorageManager in the
IVersionStorage.py file - I'm not sure why I didn't give it its own
interface file; it probably should have one.  Anyway, the API is
fairly straightforward.  It has methods for registering, getting, and
removing storages and setting/getting the storage for a particular
portal_type.  It also allows for designating one of the registered
storages as the default.

I used portal_type as the key for mapping storages because it fits in
well with the CMF view of the world where everything from actions to
workflows is triggered off portal_type.  I actually think that was one
of the best innovations of the CMF: the fact that many behaviors were
decoupled from the python class.  It made extending **much** easier
than in straight Zope since you didn't have to worry about subclassing
1000 parent classes to get all of the desired functionality.  And site
designers can add functionality from one Product to content from
another without knowing a line of python.  But I digress.

The implementation of IStorageManager is in (surprise!)
StorageManager.py.   I separated it out from Repository.py because
there's really no reason the storage manager has to be the same object
as the repository; its functionality could really be done by a
separate portal tool.  But for now Repository does subclass
StorageManager.

An implementation note: I used a BTreeFolder2 to implement
StorageManager.   This could have been a BTree object but I like using
containment for two reasons:  First, the functions for
getting/storing/removing, etc have already been implemented in the
context of the Zope folder model so why reinvent them?  Second, I
wanted to leave open the possibility that the storage backends
themselves could be visible/customizable in the ZMI.  I intended to
write a ZMI interface page for the StorageManager functions but didn't
get around to it.

Mapping objectId to backend
-------------------------------------
One of the things that current Zope/Plone versioning solutions tend to
lack is the ability to retrieve an object using only an identifier.
Clearly this is something we need and as I stated above I didn't want
to rely on the fragile objectId parsing we'd been doing.  One option
would have been to store no extra information and to simply query each
backend in turn to determine which one held the object.  But our
experience has been that module/course retrieval is the most common
operation in our system and I didn't want to slow it down by
potentially arbitrarily long lookups.  At first I was going to use a
simple dictionary/BTree to map objectId to storage.  But using
containment and storing a "stub" child object for each objectId had
several advantages:

- For ZODB based solutions (RhaptosCollection) the stub object can
double as the VersionFolder where the revisions are stored.  This cuts
down on required storage space since we're not storing data for each
objectId twice (once for the storage mapping and once for the actual
data)
- For non-ZODB solutions (RhaptosModuleStorage) the stub object
simplifies traversal since we don't have to create "virtual"
non-Persistent objects every time an object is retrieved.  This should
have the side-effect of slightly speeding up module retrieval although
the improvement may be too small to be measurable.
- As in the case of StorageManager above, using the existing
BTreeFolder2 containment functions means less implementation

There should probably be an interface file for this functionality,
although there's really only one method: _getStorageForObjectId().
Since each backend can have custom stub objects for its own purposes,
the backends must be in charge of creating the stubs.  This is
currently done in the createVersionFolder() method (stubs morphed out
of what's currently called "version folders") on the storage backend.
The call to _setObject() in createVersionFolder should probably be
moved back into Repository and out of the storages though.

Backend Dispatch
-----------------------
This one's pretty simple.  Most methods on the Repository just have to
determine which storage to use and then call the corresponding method
there.  Some methods like countRhaptosObjects() or searchRepository()
are slightly more complicated than that because they perform
operations that span multiple backends.

Backend Storage Design Notes
----------------------------------------
Unfortunately the design of the backends themselves was not in quite
as finished a state when I left.  They should mostly work, but there
are some rough edges.  This is partially do the fact that I decided to
start migrating them towards the ZopeVersionControl (ZVC) interface
while doing the refactoring.  ZVC is pretty much the de facto standard
Zope API for doing versioning so I think it's a worthwhile goal to use
their interface, but perhaps that would have best been left for a
second phase.  Ultimately I think it would be good to allow for using
the actual ZVC Product (perhaps thinly wrapped) as a storage backend.

The IVersionStorage interface defines the API for backend storage
objects.  It takes over much of the functionality that used to be in
the old VersionFolder objects (the new "stub" objects that take the
place of the VersionFolders are much simpler.)  as well as borrowing
methods from the ZVC interface.  I did not get around to implementing
all of the ZVC methods and some of the ones that are there have
workarounds and hacks that are necessary to cope with the mismatch
between the new API and our old way of storing version information.
For example, the getVersionInfo() call has to do certain strange
things to massage the data into the expected format.  Some of this
could be solved by storing versioning attributes like  version,
revised, submitter, submitlog on __version_storage__ attributes but
that will require changing much more code.   I'm sorry I can't be of
more help on that.  If you are in the code and come across a
particular question, let me know and I'll try to provide an
explanation.

My original plan was to have the ZVC applyVersionControl() take the
place of createVersionFolder().  But I later decided to hold off on
that and use applyVersionControl() to fix another long-standing bug:
that of objectIds not being acquired until the first publication
making it impossible to do cross linking of unpublished modules.  The
plan was to have applyVersionControl() call generateId() to create the
new objectId and store that on the object but not create the stub or
store the object until the first time checkinResource() was called.

Final Notes
--------------
For cleanliness some storage backends (like RhaptosModuleStorage)
should be pretty much independent of anything else in the repository.
RhaptosCollections, however, by their very nature are aware of other
objects being stored.    This will require certain knowledge about
modules.   For example, before deleting a module you have to make sure
that it isn't present in any courses.  Currently this check is done
inside RhaptosModuleStorage, but perhaps it should be done by the
course backend instead since courses already know about modules but
not vice versa.  I would recommend an event like system where a course
could "subscribe" to the module deletion event and throw some error
indicating that this module is not deletable.

The difficulty there is that I'd like the "generic" backend shipped
with RhaptosRepository to be completely content independent.  Perhaps
that means we would need a RhaptosCollectionStorage.

My original goal was to have OMDocPackage objects use the
RhaptosModuleStorage backend.  This may not be possible, however, due
to the additional processing requirements for OMDoc that take place at
the time of storage (generating .tmpl and .incl files, etc.)

Well, that about wraps it up.  If any questions come up, feel free to
drop me a line.
Created by jccooper
Last modified 2006-06-15 11:15