Supporting Multiple Languages
Multiple Language Support
Supporting multiple languages in Connexions breaks down into two areas: in the content itself (modules and courses), and in the website. For content, there are separate issues for author-provided multiple-language content, and derived-work translations. Not to mention multiple languages in a single module. The website also breaks down into two parts: the interface features (buttons and labels) and static content, essentially documentation. So that gives 4 things to worry about:
- Website interface
- Website documentation
- Multilanguage content
- Translations of content
A bit about each:
- Website interface
- The first of these is already in progress: if you switch your browser default language to something other than English, you'll see some of our labels and controls appear in translation. You do not
Log into the Spanish page, youEntrarit.
But you still see 'Content': we haven't translated all of our code, but we inherit some interface elements from Plone, which has lots of translations. Chuck Bearden is leading the effort, with the help of some outside groups, to complete and update our site translation. See his blog. Additional tools for making it clear what languages are available come with the PloneLanguageTool. Details of user interface and URI format for alternate languages is under discussion with Manpreet Kaur, our UI expert.
- Website documentation
- This will use the LinguaPlone product, and may need wait on our migration to Plone 2.5 (currently in beta). Translation of some of these documents will also be provided by our collaborators.
- Multilanguage content
- By this, I mean content that is written and maintained by its author or authors in parallel, multiple languages.
The plan here is to provide multiple language versions of a module from a single url, honoring the browser preferences as well as well as any site-language selections made under the first point, above. I've started prototyping this, and have been pleasantly surprised at how little code it takes to do this. The model here is providing multiple, language specific cnxml files inside each multilanguage module. Thanks go to one of our authors, Davide Rocchesso for this suggestion. Selecting what content to serve ends up being only six lines of code. Getting the appropriate metadata will be a few more lines, and a slight model change for our database. In essence, the objectid and revision are no longer sufficent to uniquely specify a module: you need the language as well. This takes exactly zero changes to our base schema, and only slight changes to the stored procedures.
Ah yes, the schema. Yup, we have a property for language on all of our content. The dirty little secret is that up until now, we hadn't implemented the UI to allow authors to fill it in properly. So, while we have content in multiple languages in Connexions, finding it by language has not been easy. The code for this is currently in the testing queue.
Once we backfill the appropriate language tags, we need to think hard about how language can affect discovery of content. Max has also prototyped a Browse by Language that will be the next bit of this implemented. Clearly, something needs to happen with the search interfaces, as well.
- Translations of content
- Last but not least, what about translating exisiting content? Brent Hendricks, System Architect Emeritus for Connexions, covered a design for that some time ago.
I think most of that proposal broken into a sequence of features that can then be implemented fairly simply. The availablity of multilingual content, as described above, simplifies some cases. Both implementations will eventually benefit from some further rework of our roles system, adding a new role of translator. One complication: we'll need a way to fold a translation that starts as an external, 3rd party derived-content style translation into its parent module, converting it to multilingual content. This will lead to a further need to be able to "supercede" or "obsolete" a module. Must be careful about our implicit (explicit?) promise of URL lifespan, though: we never want to have content disappear.
Languages sure do complicate life, but I think it's a good thing, in the long run.

We need to be careful about areas such as the "Spotlight" box, which, when viewed w/ Spanish as my principal browser language, is translated as "About" (well, "Acerca de", which means "About").
Also, there are certain bits and pieces of module text that are output via unibrowser.xsl, such as "Example:" or "Figure 3", for example. An idea I had for dealing with those was to make those words into variables, and feed new variable values over the English ones if the module wasn't in English. The stuff that outputs the "Related material" and "Choose a style" text is also in XSL, so I imagine the same thing would need to be done there.
I wonder if it would be possible to make a script that would take values from a .po file and turn them into an XSL file full of a language's variable values (for pieces of text such as those exemplified above). That way the content translation could take place in the same manner as the site translation, and wouldn't require translators to learn how to edit XSLT.
Playing the free software theme "steal when you can", we can
take a look at how Norm Walsh did a similar task for the DocBook xml. Stylesheets are at: /usr/share/xml/docbook/stylesheet/nwalsh/commmon on any of the debian machines. His system uses gentext.xsl and l10n.xsl, with individual <langcode>.xsl files providing translations. Fairly complex.
Another approach might be to create parallel unibrowser/<lang>.xsml files, and select which one to render with based on content language attribute. We'd need one per supported language, plus fall back to english for anthing else.