Indic-Computing Logo

Problems with the standard

SourceForge Logo
Home Project Documentation Mailing Lists Site Map

The Indic-Computing Project > Indic-Computing Handbook > Standards > The Unicode Standard > Problems with the standard

10.2 Problems with the standard

The Unicode standard has seen a fair degree of criticism from Indian linguists and researchers.

10.2.1 Disputed Characters in the Standard

The standard defines code point 0904 DEVANAGARI LETTER SHORT A; however some linguists dispute the existence this character in the Devanagari script (see 1).

10.2.2 Missing Characters

The Marathi script uses a grapheme that is a combination of a DEVANAGARI LETTER A and a CANDRA mark. This grapheme is missing from the Unicode standard for the Devanagari script, though a related grapheme 090D DEVANAGARI LETTER CANDRA E is present in the standard (see 1).

10.2.3 Inconsistent Semantics

The published policy of the Unicode consortium is to disallow use of the 200D ZERO WIDTH JOINER (ZWJ) character to encode semantic differences. The original purpose for the ZWJ was to signal possible script ligation; so the underlying meaning of a sequence of Unicode characters was to be independent of the presence or absence of the ZWJ character inside it.

However this published policy was violated for the Devanagari script; for this script ZWJ was defined as encoding a display variants of conjunct consonants. Encoding display variants was a major deviation from the display-independent nature of the Unicode standard.

Subsequently, for Indic scripts alone, the consortium chose to define the ZWJ character as (sometimes) causing a semantic distinction.

This implies that for indic scripts two sequences of unicode codepoints that are identical except for the presence of ZWJ codepoints could sometimes represent two different words and could at other times represent an alternate display form of the same word. This inconsistency makes processing indic text difficult, for example, see 1 for an example of the complications faced when implementing a Marathi spell checker.

This, and other project documentation, can be downloaded from [ http://indic-computing.sourceforge.net/documentation.html ].


Copyright © 2001--2009 The Indic-Computing Project.
Contact: jkoshy
View document revision history
Built With WebMake
Site Search Google