MOSS comes with a bunch of document converters (DC) and two DC services to convert a document’s content from one format to another. Before you can use DC functionality within your Site Collection you need to enable it at the farm level, setup the DC load balancer service (internal) and activate the DC use on a Site Collection from within Centrel Administration.

I worked with the document converters on a recent project (well almost 4 months old now) and have a few things to share around the Word to Web Page conversion functionality in MOSS. On the whole I found DCs as a fast way to publish content but the experience has left a few things to be desired. BTW the customer was statisfied with the results, they had a large bunch of word manuls that they were able to WCMise!

Lessons Learnt

  • Document Converters are a reasonable way to convert large corpuses of pre written offline content in Word format to web based publishing pages – page layouts and styles to a certain degree can be controlled.
  • For document converters to work documents need to be uploaded into conversion functionality enabled document libraries. The documents can then be converted and published into separate sites and even site collections.
  • Any practical use of Document Converts, specially when converting large volumes of documents warrents the use of a programmed custom utility with ability to call conversions automatically. If not then the manual content publishing speed (multiple clicks) is not very different to manually copying and pasting content from word documents into page content fields (in some cases even wrose).
  • Document Converters for Word only work with the new Office format i.e. Word 2007 and cannot convert older formats, period.
  • Word to Page DC can mysteriously (Undocumented) pick up meta data if the same content type is being used to store the documents and pages (highly unlikely but possible if changes are made at the item level content type).
  • Document Converters do not support images or bullet point conversions instead have an empty image placeholder and fake (some Unicode symbol) big black points for bullet points. Do not expect full fidelity between Word mark-up and HTML mark-up and mostly be ready to clean up the unusual Office style tags after conversion.
  • One thing that we discovered the hard way and what I think rhymes with a hug is the fact that Document Converted Publishing Pages are linked back to the original source – which is a good thing (although I did not find a way to unlink them via the web UI). This works well when you change the source document and refresh the target page (explictly) But to my surprise for some unknwon reason the damn connections are hard coded rather then being relative to the site collection. So if you change host names, restore your site collection in a new domain you end with ‘Unknown Errors’ for all converted pages. 
  • A good way is to have metadata like Page Layouts and Sites and asscoiated with the source document library and the custom utility can pick this up and pass the parameters to the convertors. Also a utility to clean unwanted local style elements is a must as the style functionality within MOSS does not work or I have been unable to make it work as advertised.