Indic-Computing Documentation Infrastructure
Prev		Next

2 Design Goals

The major design goals for our documentation system are:

Design Goals

Support for a variety of distribution formats, so that our documentation is viewable on a wide variety of platforms. Further, our system should be adaptable to newer distribution formats as and when they become popular.
Our documentation should be capable of being processed on systems without native support for the languages being described. There are a few reasons for this design requirement:
- We are attempting to describe the linguistic properties of a number of indian languages which may not have associated character set standards in place.
- Further, support for even those indian languages that have standardization in place is today weak (or non-existent) on many computer systems. We cannot rely on support for a given language being natively part of the system on which the documentation is being processed.
- Finally, in the case of a few indian languages, the standards themselves are under dispute or have not been clearly specified, leading to implementation dependent variations in text-processing behavior.
Our documentation system should be built over portable tools, allowing it to be used on a wide variety of systems.
Our documentation system should be designed for longevity.
- wherever possible, we attempt use open-source tools so that our delivery capability is not jeopardized by the ups and downs in the commercial world.
- wherever possible, we attempt to use technologies that are open standards, in preference to proprietary technologies.
The documentation system should support collaborative development of documentation across the Internet. This allows us to tap into the power of the open-source development model.
Our documentation system should be clone-able, so that other linguistic groups can reuse our infrastructure without having to re-invent the wheel.
Finally, the infrastructure should be relatively complete; it should be possible to do everything required within the framework of the documentation infrastructure.

2.1 Implementation choices

The current design meets these design goals reasonably well, but not perfectly. Our design goals led to the following implementation choices.

Documentation delivery using HTML, Postscript™ and PDF.
HTML
The advantages of HTML are its near-ubiquitous usage. This is one medium that has near universal availability. Further, HTML is an open standard managed by the World Wide Web Consortium. The downside to using HTML however, is that indian language scripts are difficult to represent portably.
- Using document specific fonts pushes the burden of font management onto the user.
- Representing indian language phrases with embedded graphics works, but is clunky. Pages with many embedded images may not render and print properly in some browsers.
- Creating HTML encoded in a suitable Unicode encoding like UTF-8 will not suffice for those indian languages which are not represented in the Unicode character set.
We have currently selected the option of rendering indian language scripts into embedded graphics images.
PDF, Postscript™

PDF and Postscript are proprietary page description technologies from Adobe™ Inc. These technologies were selected for their universal availability and their ability to handle indic scripts.

Both PDF and Postscript allow documents to contain embedded fonts. Further, their font technologies are sophisticated enough to be able to handle indic script rendering.

Note: The current tool-chain does not handle PDF and Postscript very well. In particular, embedding multiple fonts into a generated PDF or Postscript has not been attempted. Fixing this is a TODO item.
The use of the DocBook, an SGML DTD with Indic-Computing specific extensions.
- SGML documents are represented as ``plain text'' and are edit-able by standard text editors on virtually every platform.
- Using generalized markup allows us to separate the final visual representation of the text from the logical structure of the document. For example, we can markup indian language phrases using custom tags like <indicphrase> as in Figure 1.
  Figure 1. Usage of <indicphrase>
```
    ... a <foreignphrase>chillaksharam</foreignphrase> used in
    the word <indicphrase encoding="ml-itrans">avaNN</indic>
                   
```
  We can then transform the content of <indicphrase> into an indic script fragment in a manner appropriate for the display technology being used.
- Further, using structured markup allows us to verify (some) aspects of document structure before we proceed to publishing them. This feature catches a surprising number of common errors in documents.
- SGML allows easy (only a simple matter of programming transformation of documents to other formats.
- SGML documents are long-lived; the SGML specification is itself an open standard (see Section 5) and has seen extensive deployment in the last two decades. SGML and its related standard XML are likely to be around for a long time.
- The DocBook DTD was chosen because it was designed to describe documentation about computing related subjects. Prominent users of the DocBook DTD in the open-source world are the FreeBSD Project, the GNOME Project and the XFree86 Project. Further, DocBook is extensible; we have added a few new tags (see Section 3.2.1) that capture our intended semantics more precisely.
  
  The TEI DTD is another open DTD that was considered for our use. While this DTD was designed with linguistics in mind, its tool-chain support is not as good (as of this writing) as that of DocBook.
The a tool chain derived from that of the FreeBSD Documentation Project.

The FreeBSD Documentation Project has invested a considerable amount of effort into creating a tool-chain that can robustly process SGML. Using their work as a base offered us:
- quick start up time,
- a completely open-source tool-chain that runs on a popular open-source unix platform,
- the ability to track and reuse future enhancements implemented by the FreeBSD Documentation Project.
Access to the documentation infrastructure via SourceForge™

SourceForge is a large collaborative development system used by many open-source projects. We manage our documentation infrastructure using SourceForge's source control facilities. This gives us the following advantages:
- open access to everyone
- a familiar source control interface, namely CVS
Note: Thanks to the SourceForge.Net team for generously providing the necessary infrastructure!

Prev	Home	Next
Indic-Computing Documentation Infrastructure		Implementation