Update: 2005-03-29 11:57 AM -0500

TIL

The Unicode Standard, Version 4.0
Devanagari Script

Chapter 9

Unicode Consortium, http://www.unicode.org/versions/Unicode4.0.0/ch09.pdf

Downloaded and rewritten in HTML by U Kyaw Tun, M.S. (I.P.S.T., U.S.A.) on behalf of TIL Computing and Language Center, Yangon, MYANMAR, for research on Burmese language. This page contains additions by UKT (U Kyaw Tun). Not for sale.

These pages are in Arial Unicode MS font. Please remember that not all Unicode fonts are alike. Myanmar characters are in gif picture format and you do not need any Myanmar font to read it. Myanmar spellings in both Myanmar script and in Romabama and are included. Romabama spellings are within { }, and words within < > are regular English words.

Top | TIL home page
Linguistics - index

South Asian scripts

Devanagari
Encoding principles
Principles of script | Rendering Devanagari characters |
 Consonant Letters |  Independent vowel letters | Dependent vowel signs|
 Virama | Consonant conjuncts | Explicit virama |
 Explicit half-consonants |
Rendering |
Combining marks |
Digits |
Punctuation and symbols

Top | TIL home page
Linguistics - index

South Asian scripts

The scripts of South Asia (representing the orthographic syllable) share so many common features that a side-by-side comparison of a few will often reveal structural similarities even in the modern letterforms. With minor historical exceptions, they are written from left to right. They are all abugidas (also called syllabic alphabet and alpha-syllabary) in which most symbols or akshara {ak~hka.ra} stand for a consonant {byi:~} plus an inherent vowel (usually the sound /a/ : in Myanmar: {a.} -- neither {a}, nor {a:}). Word-initial vowels in many of these scripts have distinct symbols, and word-internal vowels are usually written by juxtaposing a vowel-sign in the vicinity of the affected consonant-akshara {byi:~ ak~kha.ra}. Absence of the inherent vowel, when that occurs, is frequently marked with a special sign. In the Unicode Standard, this sign is denoted by the Sanskrit word virāma (Burmese-Myanmar: {a-that}). In some languages another designation is preferred. In Hindi, for example, the word hal refers to the character itself, and halant refers to the consonant-letter that has its inherent vowel suppressed; in Tamil, the word puḷḷi is used. The virama sign ( {tan°hkun} -- meaning: flag.) nominally serves to suppress the inherent vowel of the consonant to which it is applied; it is a combining character, with its shape varying from script to script.

Most of the scripts of South Asia, from north of the Himalayas to Sri Lanka in the south, from Pakistan in the west to the easternmost islands of Indonesia, are derived from the ancient Brahmi script {brah~mi ak~hka.ra}. The oldest lengthy inscriptions of India, the edicts of Ashoka from the third century, were written in two scripts, Kharoshthi and Brahmi. These are both ultimately of Semitic origin, probably deriving from Aramaic, which was an important administrative language of the Middle East at that time. Kharoshthi, written from right to left, was supplanted by Brahmi and its derivatives. The descendants of Brahmi spread with myriad changes throughout the subcontinent and outlying islands. There are said to be some 200 different scripts deriving from it. By the eleventh century, the modern script known as Devanagari was in ascendancy in India proper as the major script of Sanskrit literature. This northern branch includes such modern scripts as Bengali, Gurmukhi, and Tibetan; the southern branch includes scripts such as Malayalam and Tamil.

The major official scripts of India proper, including Devanagari, are all encoded according to a common plan, so that comparable characters are in the same order and relative location. This structural arrangement, which facilitates transliteration to some degree, is based on the Indian national standard (ISCII) encoding for these scripts, and makes use of a virama {a-that}. Sinhala has a virama-based model, but is not structurally mapped to ISCII. Tibetan stands apart, using a subjoined consonant model for conjoined consonants, reflecting its somewhat different structure and usage. The Limbu script makes use of an explicit encoding of syllable-final consonants.

Many of the character names in this group of scripts represent the same sounds, and naming conventions are similar across the range.

Top | TIL home page
Linguistics - index

9.1 Devanagari

See:
• Code chart: U0900 Devanagari
• MS Windows XP Character Map U0901 to U0970

UKT: Burmese has many characters corresponding to Devanagari. At least, in five characters, we find the similarity:
• virama ् (Myanmar {athat} ),
• visarga ः (Myanmar {wus.sa.} ),
• anusvara ं (Myanmar {thé:thé:tin} ),
• danda । (Myanmar {poad'hti:}), and
• double danda ॥ (Myanmar {poad'ma.} ).

The Devanagari script is used for writing classical Sanskrit and its modern historical derivative, Hindi. Extensions to the Sanskrit repertoire are used to write other related languages of India (such as Marathi) and of Nepal (Nepali). In addition, the Devanagari script is used to write the following languages: Awadhi, Bagheli, Bhatneri, Bhili, Bihari, Braj Bhasha, Chhattisgarhi, Garhwali, Gondi (Betul, Chhindwara, and Mandla dialects), Harauti, Ho, Jaipuri, Kachchhi, Kanauji, Konkani, Kului, Kumaoni, Kurku, Kurukh, Marwari, Mundari, Newari, Palpa, and Santali.

All other Indic scripts, as well as the Sinhala script of Sri Lanka, the Tibetan script, and the Southeast Asian scripts, are historically connected with the Devanagari script as descendants of the ancient Brahmi script. The entire family of scripts shares a large number of structural features.

The principles of the Indic scripts are covered in some detail in this introduction to the Devanagari script. The remaining introductions to the Indic scripts are abbreviated but highlight any differences from Devanagari where appropriate.

Standards. The Devanagari block of the Unicode Standard is based on ISCII-1988 (Indian Script Code for Information Interchange). The ISCII standard of 1988 differs from and is an update of earlier ISCII standards issued in 1983 and 1986.

The Unicode Standard encodes Devanagari characters in the same relative positions as those coded in positions A0-F416 in the ISCII-1988 standard. The same character code layout is followed for eight other Indic scripts in the Unicode Standard: Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam. This parallel code layout emphasizes the structural similarities of the Brahmi scripts and follows the stated intention of the Indian coding standards to enable one-to-one mappings between analogous coding positions in different scripts in the family. Sinhala, Tibetan, Thai, Lao, Khmer, Myanmar, and other scripts depart to a greater extent from the Devanagari structural pattern, so the Unicode Standard does not attempt to provide any direct mappings for these scripts to the Devanagari order.

In November 1991, at the time The Unicode Standard, Version 1.0, was published, the Bureau of Indian Standards published a new version of ISCII in Indian Standard (IS)13194:1991. This new version partially modified the layout and repertoire of the ISCII-1988 standard. Because of these events, the Unicode Standard does not precisely follow the layout of the current version of ISCII. Nevertheless, the Unicode Standard remains a superset of the ISCII-1991 repertoire except for a number of new Vedic extension characters defined in IS 13194:1991 Annex G-Extended Character Set for Vedic. Modern, non-Vedic texts encoded with ISCII-1991 may be automatically converted to Unicode code points and back to their original encoding without loss of information.

Top | TIL home page
Linguistics - index

Encoding Principles

UKT:  The medium of communication between two normal human-beings is sound which involves the vibration of air molecules between the speaker and the hearer. This is what we mean by "language". Through out human history, until recently, attempts have been made to represent the sound by making visual marks on suitable materials such as palm leaves and stone-tablets. These visual marks do not constitute the language but only its representation and is known as the "script". Burmese and English are languages whereas Myanmar and Roman are scripts.
   The fundamental unit of language is the syllable: for English, it is the English syllable and for Burmese the Burmese syllable.
   The English syllable is represented by the alphabet-script known as the Roman script (the smallest unit being the "letter"), whereas the Burmese syllable is represented by abugida-script the smallest of which is the akshara.
   The fundamental unit of the Roman alphabet, the "letter", is of two kinds: the consonant-letter, and the vowel-letter. The English syllable is represented by one or more vowel-letters, or by the vowel-letter(s) modified by consonant-letters. Then comes the most important difference between the two scripts: the fundamental unit of the Myanmar-script is the orthographic-syllable or the akshara representing a syllable of the Burmese-speech.
   The English-Roman syllable has structure CVC (consonant-letter, vowel-letter, consonant-letter -- please avoid using "consonant-vowel-consonant"). However, the effective unit of Burmese-Myanmar is the orthographic syllable, consisting of a consonant-letter and vowel-letter (CV) core and, optionally, one or more preceding consonantal-letters, with a canonical structure of (((C)C)C)V.

The writing systems that employ Devanagari and other Indic scripts constitute abugidas -- a cross between syllabic writing systems and alphabetic writing systems. The effective unit of these writing systems is the orthographic syllable {ak~hka.ra}, consisting of a consonant and vowel (CV) core and, optionally, one or more preceding consonants, with a canonical structure of (((C)C)C)V. The orthographic syllable need not correspond exactly with a phonological syllable, especially when a consonant cluster is involved, but the writing system is built on phonological principles and tends to correspond quite closely to pronunciation.

The orthographic syllable is built up of alphabetic pieces, the actual letters of the Devanagari script. These pieces consist of three distinct character types:
 • consonant-letters,
 • vowel-letters (dubbed by Unicode 4 as "independent vowels"), and
 • vowel-signs ("dependent vowel signs" of Unicode 4).
In a text sequence, these characters are stored in logical (phonetic) order.

Top | TIL home page
Linguistics - index

Principles of the Script

Rendering Devanagari Characters

Devanagari characters, like characters from many other scripts, can combine or change shape depending on their context. A character's appearance is affected by its ordering with respect to other characters, the font used to render the character, and the application or system environment. These variables can cause the appearance of Devanagari characters to differ from their nominal glyphs (used in the code charts).

Additionally, a few Devanagari characters cause a change in the order of the displayed characters. This reordering is not commonly seen in non-Indic scripts and occurs independently of any bidirectional character reordering that might be required.

Top | TIL home page
Linguistics - index

Consonant Letters

Each consonant letter represents a single consonantal sound but also has the peculiarity of having an inherent vowel, generally the short vowel /a/ in Devanagari and the other Indic scripts. Thus U0915 क DEVANAGARI LETTER KA (and {ka.} MYANMAR LETTER KA) represents not just /k/ but /ka/. In the presence of a vowel-sign, however, the inherent vowel /a/ associated with a consonant letter is overridden by the vowel-sign.

Consonant letters may also be rendered as half-forms (e.g. ), which are presentation forms used to depict the initial consonant in consonant clusters. These half-forms do not have an inherent vowel. Their rendered forms in Devanagari often resemble the full consonant but are missing the vertical stem, which marks a syllabic core. (The stem glyph is graphically and historically related to the sign denoting the inherent /a/ vowel.)

Some Devanagari consonant letters have alternative presentation forms whose choice depends upon neighboring consonants. This variability is especially notable for U0930 र DEVANAGARI LETTER RA, which has numerous different forms, both as the initial element and as the final element of a consonant cluster. Only the nominal forms, rather than the contextual alternatives, are depicted in the code chart.

UKT: Forms of Myanmar {ra.}: ; (e.g. in ) ; in {ra.ris} for en-characters, and in {ra.ris} for em-characters

The traditional Sanskrit/Devanagari alphabetic encoding order for consonants follows articulatory phonetic principles, starting with velar consonants and moving forward to bilabial consonants, followed by liquids and then fricatives. ISCII and the Unicode Standard both observe this traditional order.

     
  UKT: Sanskrit is learnt by some Myanmar Buddhist monks. However, Pali is learned to a greater extent by both monks and lay-persons alike. Pali and Sanskrit used in Myanmar are in Myanmar script. The following is the traditional way of presenting the 33 Myanmar consonants (Burmese: {byi:~}; Pali: vyañjana). Grouping (in Pali) are from An Elementary Pali Course, by Ven. Narada Thera, http://www.buddhanet.net/pdf_file/ele_pali.pdf
 
 

Dental-labial [v] is written in Burmese-Myanmar and M-Pali as [w] or {wa.}. Burmese-Myanmar and M-Pali {th.} is absent in E-Pali , where this character is written with [s]. {a.} /a/ represents the inherent vowel present in consonants.

Top | TIL home page
Linguistics - index

Independent Vowel Letters

The independent vowels in Devanagari are letters that stand on their own. The writing system treats independent vowels as orthographic CV syllables in which the consonant is null. The independent vowel letters are used to write syllables that start with a vowel.

  UKT: There are 12 vowels in Burmese-Myanmar.  
   

Top | TIL home page
Linguistics - index

Dependent Vowel Signs (Matras)

The dependent vowels serve as the common manner of writing non-inherent vowels and are generally referred to as vowel signs, or as matras in Sanskrit. The dependent vowels do not stand alone; rather, they are visibly depicted in combination with a base letterform. A single consonant, or a consonant cluster, may have a dependent vowel applied to it to indicate the vowel quality of the syllable, when it is different from the inherent vowel. Explicit appearance of a dependent vowel in a syllable overrides the inherent vowel of a single consonant letter.

UKT: {ma.tra}, {mat~ta}, {ma-tra} -- the length of time to pronounce a short vowel, a mora or prosodial unit (a long vowel contains two moras and a prosodial vowel three -- U Hoke Sein, The Universal Burmese, English, Pali Dictionary, 1st. ed., 1978, p578

The greatest variation among different Indic scripts is found in the way that the dependent vowels are applied to base letterforms. Devanagari has a collection of non-spacing dependent vowel signs that may appear above or below a consonant letter, as well as spacing dependent vowel signs that may occur to the right or to the left of a consonant letter or consonant cluster. Other Indic scripts generally have one or more of these forms, but what is a non-spacing mark in one script may be a spacing mark in another. Also, some of the Indic scripts have single dependent vowels that are indicated by two or more glyph components -- and those glyph components may surround a consonant letter both to the left and right or may occur both above and below it.

The Devanagari script has only one character denoting a left-side dependent vowel sign: U093F ि DEVANAGARI VOWEL SIGN I . Other Indic scripts either have no such vowel signs (Telugu and Kannada) or include as many as three of these signs (Bengali, Tamil, and Malayalam).

A one-to-one correspondence exists between the independent vowels and the dependent vowel signs. Independent vowels are sometimes represented by a sequence consisting of the independent form of the vowel /a/ followed by a dependent vowel sign. Figure 9-1 illustrates this relationship (see the notation formally described in the Rules for Rendering later in this section).


Image captured from 150% size of PDF page.

 

UKT: The following shows how Myanmar syllables are formed from consonants and vowel-signs.

 
  +  
  {a.}   {i.}   {i.}  
  +  
  {ka.}   {i.}   {ki.}  
  +  
  {a.}   {u.}   {u.}  
  +  
  {ka.}   {u.}   {ku.}  

The combination of the independent form of the default vowel /a/ (in the Devanagari script, U0905 अ DEVANAGARI LETTER A) with a dependent vowel sign may be viewed as an alternative spelling of the phonetic information normally represented by an isolated independent vowel form. However, these two representations should not be considered equivalent for the purposes of rendering. Higher-level text processes may choose to consider these alternative spellings equivalent in terms of information content, but such an equivalence is not stipulated by this standard.

Top | TIL home page
Linguistics - index

Virama (Halant)

Devanagari employs a sign known in Sanskrit as the virama or vowel omission sign {a.thut}. In Hindi it is called hal or halant, and that term is used in referring to the virama or to a consonant with its vowel suppressed by the virama; the terms are used interchangeably in this section.

The virama sign, U094D ् DEVANAGARI SIGN VIRAMA, {a.thut tan°hkun} , nominally serves to cancel (or kill) the inherent vowel of the consonant to which it is applied. When a consonant has lost its inherent vowel by the application of virama, it is known as a dead consonant (e.g. ); in contrast, a live consonant (e.g. ) is one that retains its inherent vowel or is written with an explicit dependent vowel sign. In the Unicode Standard, a dead consonant is defined as a sequence consisting of a consonant letter followed by a virama. The default rendering for a dead consonant is to position the virama as a combining mark bound to the consonant letterform.

UKT: The virama sign is known in Bama as {a.thut} and is represented with a {tan°hkwun} (literal meaning: a flag) which has the appearance .

For example, if Cn denotes the nominal form of consonant C, and Cd denotes the dead consonant form, then a dead consonant is encoded as shown in Figure 9-2.


Image captured from 150% size of PDF page.

Top | TIL home page
Linguistics - index

Consonant Conjuncts

UKT: Known in Burmese-Myanmar as {byi:~ twè:} -- Myanmar sa-loan paung that-poan kyam, 1986, p.htta . There are more conjuncts in Sanskrit than in Myanmar.
   There are two kinds of consonant conjuncts: {paahT-hsin.} and {kin:si:}

The Indic scripts are noted for a large number of consonant conjunct forms that serve as orthographic abbreviations (ligatures) of two or more adjacent letterforms. This abbreviation takes place only in the context of a consonant cluster. An orthographic consonant cluster is defined as a sequence of characters that represents one or more dead consonants (denoted Cd ) followed by a normal, live consonant letter (denoted Cl ).

UKT: There are two types of conjuncts In Myanmar: one type being used as onset and the other present in the coda. It is sufficient at this point to say that a "killed" consonant is never found in the onset.

{k-ka.} can be written as a vertical conjunct with the virama hidden as: {k~ka.}. This vertical conjunct is never found in the onset of a syllable because it can not be pronounced.  {k~ka.} is an example of {paahT-hsin.}

{tak-ka.} is a bisyllable which can be pronounced. It can be written with as {tak~ka.}. It then becomes a bisyllable and is part of the word for <university>.

• An another form of consonant-conjunct is found when the killed consonant is r1c5 {nga.} (Latin small letter Eng, U014B, the IPA /ŋ/ -- present in English <sing>. This "conjunct" is {kin:si:}. An example is {ing:ga.leit} meaning <English>. It should be noted that a {kin:si:} can be un-ligatured without loss in meaning. Thus: {ing:ga.leit} is perfectly ligitimate. Romabama does not differentiate between the ligatured and un-ligatured forms.

• Only four conjuncts are allowed as onsets in Myanmar: {ya.pin.} shape: ; {ra.ris.} or ; {wa.hswè:} ; and {ha.hto:} or formed from {ya.}, {ra.}, {wa.} and {ha.} respectively. -- They are known as "medials".

• More onset-conjuncts can be formed from the above four. e.g. {ya.pin.wa.hswè:} . Note that the preferred term is not {wa.hswè:ya.pin.}.

Under normal circumstances, a consonant cluster is depicted with a conjunct glyph if such a glyph is available in the current font(s). In the absence of a conjunct glyph, the one or more dead consonants that form part of the cluster are depicted using half-form glyphs. In the absence of half-form glyphs, the dead consonants are depicted using the nominal consonant forms combined with visible virama signs (see Figure 9-3).


Image captured from 150% size of PDF page.

UKT: The following is a reproduction of Fig 9-3 showing
the sequence of Unicode fonts typed in:     
(1) U0917+U094D+U0927 = ग्ध                   
   
(2) U0915+U094D+U0915 = क्क equivalent of
(3) U0915+U094D+U0937 = क्ष                     
(4) U0930+U094D+U0915 = र्क                     

A number of types of conjunct formations appear in these examples: (1) a half-form of GA in its combination with the full form of DHA; (2) a vertical conjunct K.KA; and (3) a fully ligated conjunct K.SSA, in which the components are no longer distinct. Note that in example (4) in Figure 9-3, the dead consonant RAd is depicted with the nonspacing combining mark RAsup(repha).

A well-designed Indic script font may contain hundreds of conjunct glyphs, but they are not encoded as Unicode characters because they are the result of ligation of distinct letters. Indic script rendering software must be able to map appropriate combinations of characters in context to the appropriate conjunct glyphs in fonts.

Top | TIL home page
Linguistics - index

Explicit Virama (Halant)

Normally a virama character serves to create dead consonants that are, in turn, combined with subsequent consonants to form conjuncts. This behavior usually results in a virama sign not being depicted visually. Occasionally, this default behavior is not desired when a dead consonant should be excluded from conjunct formation, in which case the virama sign is visibly rendered. To accomplish this goal, the Unicode Standard adopts the convention of placing the character U200C ‌  ZERO WIDTH NON-JOINER immediately after the encoded dead consonant that is to be excluded from conjunct formation. In this case, the virama sign is always depicted as appropriate for the consonant to which it is attached.

UKT: Characters such as U200C  ZERO WIDTH NON-JOINER are presented in Code charts. NWNJ stands for ZERO WIDTH NON-JOINER. See "U2000 General Punctuation" Unicode Consortium, Copyright 1991-2003 Unicode, Inc. All Rights Reserved.   http://www.unicode.org/charts/

Top | TIL home page
Linguistics - index

For example, in Figure 9-4, the use of ZERO WIDTH NON-JOINER prevents the default formation of the conjunct form क ्ष  (K.SSAn). (UKT: The previous character is obtained by inputting U0915 U094D U0937)


Image captured from 150% size of PDF page.

Top | TIL home page
Linguistics - index

Explicit Half-Consonants

When a dead consonant participates in forming a conjunct, the dead consonant form is often absorbed into the conjunct form, such that it is no longer distinctly visible. In other contexts, the dead consonant may remain visible as a half-consonant form. In general, a half-consonant form is distinguished from the nominal consonant form by the loss of its inherent vowel stem, a vertical stem appearing to the right side of the consonant form. In other cases, the vertical stem remains but some part of its right-side geometry is missing.

In certain cases, it is desirable to prevent a dead consonant from assuming full conjunct formation yet still not appear with an explicit virama. In these cases, the half-form of the consonant is used. To explicitly encode a half-consonant form, the Unicode Standard adopts the convention of placing the character U200D ZERO WIDTH JOINER   immediately after the encoded dead consonant. The ZERO WIDTH JOINER denotes a nonvisible letter that presents linking or cursive joining behavior on either side (that is, to the previous or following letter). Therefore, in the present context, the ZERO WIDTH JOINER may be considered to present a context to which a preceding dead consonant may join so as to create the half-form of the consonant.

For example, if Ch denotes the half-form glyph of consonant C, then a half-consonant form is encoded as shown in Figure 9-5.


Image captured from 150% size of PDF page.

• In the absence of the ZERO WIDTH JOINER, this sequence would normally produce the full conjunct form  क्ष  (K.SSAn). This encoding of half-consonant forms also applies in the absence of a base letterform. That is, this technique may also be used to encode independent half-forms, as shown in Figure 9-6.


Image captured from 150% size of PDF page.

Top | TIL home page
Linguistics - index

Consonant Forms. In summary, each consonant may be encoded such that it denotes a live consonant, a dead consonant that may be absorbed into a conjunct, or the half-form of a dead consonant (see Figure 9-7).


Image captured from 150% size of PDF page.

Top | TIL home page
Linguistics - index

Rendering

Rules for Rendering. This section provides more formal and detailed rules for minimal rendering of Devanagari as part of a plain text sequence. It describes the mapping between Unicode characters and the glyphs in a Devanagari font. It also describes the combining and ordering of those glyphs.

These rules provide minimal requirements for legibly rendering interchanged Devanagari text. As with any script, a more complex procedure can add rendering characteristics, depending on the font and application.

It is important to emphasize that in a font that is capable of rendering Devanagari,
the number of glyphs is greater than the number of Devanagari characters.

Notation. In the next set of rules, the following notation applies:

  Cn Nominal glyph form of consonant C as it appears in the code charts.  
  Cl A live consonant, depicted identically to Cn.  
  Cd Glyph depicting the dead consonant form of consonant C.  
  Ch Glyph depicting the half-consonant form of consonant C.  
  Ln

Nominal glyph form of a conjunct ligature consisting of two or more component consonants. A conjunct ligature composed of two consonants X and Y is also denoted X.Yn.

 
  RAsupA

A non-spacing combining mark glyph form of U0930 ः DEVANAGARI LETTER RA positioned above or attached to the upper part of a base glyph form. This form is also known as repha.

 
  RAsub

A nonspacing combining mark glyph form of U0930 ः DEVANAGARI LETTER RA positioned below or attached to the lower part of a base glyph form.

 
  Vvs

Glyph depicting the dependent vowel sign form of a vowel V.

 
  VIRAMAn

The nominal glyph form of the nonspacing combining mark depicting U094D ् DEVANAGARI SIGN VIRAMA.

 

• A virama character is not always depicted; when it is depicted, it adopts this nonspacing mark form.

Top | TIL home page
Linguistics - index

Dead Consonant Rule. The following rule logically precedes the application of any other rule to form a dead consonant. Once formed, a dead consonant may be subject to other rules described next.

Rule 1. When a consonant Cn precedes a VIRAMAn , it is considered to be a dead consonant Cd . A consonant Cn that does not precede VIRAMAn is considered to be a live consonant Cl.


Image captured from 150% size of PDF page.

Consonant RA Rules. The character U0930 र DEVANAGARI LETTER RA takes one of a number of visual forms depending on its context in a consonant cluster. By default, this letter is depicted with its nominal glyph form (as shown in the U0900 Devanagari ). In some contexts, it is depicted using one of two nonspacing glyph forms that combine with a base letterform.

Rule 2. If the dead consonant RAd precedes a consonant, then it is replaced by the superscript nonspacing mark RAsup , which is positioned so that it applies to the logically subsequent element in the memory representation.


Image captured from 150% size of PDF page.

Rule 3. If the superscript mark RAsup is to be applied to a dead consonant and that dead consonant is combined with another consonant to form a conjunct ligature, then the mark is positioned so that it applies to the conjunct ligature form as a whole.


Image captured from 150% size of PDF page.

Rule 4. If the superscript mark RAsup is to be applied to a dead consonant that is subsequently replaced by its half-consonant form, then the mark is positioned so that it applies to the form that serves as the base of the consonant cluster.


Image captured from 150% size of PDF page.

Top | TIL home page
Linguistics - index

Rule 5. In conformance with the ISCII standard, the half-consonant form RRAh is represented as eyelash-RA. This form of RA is commonly used in writing Marathi and Newari.


Image captured from 150% size of PDF page.

Rule 5a. For compatibility with The Unicode Standard, Version 2.0, if the dead consonant RAd precedes ZERO WIDTH JOINER, then the half-consonant form RAh, depicted as eyelash-RA, is used instead of RAsup .


Image captured from 150% size of PDF page.

Rule 6. Except for the dead consonant RAd, when a dead consonant Cd precedes the live consonant RAl , then Cd is replaced with its nominal form Cn , and RA is replaced by the subscript nonspacing mark RAsub, which is positioned so that it applies to Cn.


Image captured from 150% size of PDF page.

Rule 7. For certain consonants, the mark RAsub may graphically combine with the consonant to form a conjunct ligature form. These combinations, such as the one shown here, are further addressed by the ligature rules described shortly.


Image captured from 150% size of PDF page.

Rule 8. If a dead consonant (other than RAd ) precedes RAd, then the substitution of RA for RAsub is performed as described above; however, the VIRAMA that formed RAd remains so as to form a dead consonant conjunct form.


Image captured from 150% size of PDF page.

A dead consonant conjunct form that contains an absorbed RAd may subsequently combine to form a multipart conjunct form.


Image captured from 150% size of PDF page.

Top | TIL home page
Linguistics - index

Modifier Mark Rules. In addition to vowel signs, three other types of combining marks may be applied to a component of an orthographic syllable or to the syllable as a whole: nukta, bindus, and svaras.

Rule 9. The nukta sign, which modifies a consonant form, is placed immediately after the consonant in the memory representation and is attached to that consonant in rendering. If the consonant represents a dead consonant, then NUKTA should precede VIRAMA in the memory representation.


Image captured from 150% size of PDF page.

Rule 10. The other modifying marks, bindus and svaras, apply to the orthographic syllable as a whole and should follow (in the memory representation) all other characters that constitute the syllable. In particular, the bindus should follow any vowel signs, and the svaras should come last. The relative placement of these marks is horizontal rather than vertical; the horizontal rendering order may vary according to typographic concerns.


Image captured from 150% size of PDF page.

Ligature Rules. Subsequent to the application of the rules just described, a set of rules governing ligature formation apply. The precise application of these rules depends on the availability of glyphs in the current font(s) being used to display the text.

Rule 11. If a dead consonant immediately precedes another dead consonant or a live consonant, then the first dead consonant may join the subsequent element to form a two-part conjunct ligature form.


Image captured from 150% size of PDF page.

Rule 12. A conjunct ligature form can itself behave as a dead consonant and enter into further, more complex ligatures.


Image captured from 150% size of PDF page.

A conjunct ligature form can also produce a half-form.


Image captured from 150% size of PDF page.

Top | TIL home page
Linguistics - index

Rule 13. If a nominal consonant or conjunct ligature form precedes RAsub as a result of the application of rule R6, then the consonant or ligature form may join with RAsub to form a multipart conjunct ligature (see rule R6 for more information).


Image captured from 150% size of PDF page.

Rule 14. In some cases, other combining marks will combine with a base consonant, either attaching at a nonstandard location or changing shape. In minimal rendering there are only two cases, RAl  with Uvs or UUvs.


Image captured from 150% size of PDF page.

Memory Representation and Rendering Order. The order for storage of plain text in Devanagari and all other Indic scripts generally follows phonetic order; that is, a CV syllable with a dependent vowel is always encoded as a consonant letter C followed by a vowel sign V in the memory representation. This order is employed by the ISCII standard and corresponds to both the phonetic and the keying order of textual data (see Figure 9-8).


Image captured from 150% size of PDF page.

Because Devanagari and other Indic scripts have some dependent vowels that must be depicted to the left side of their consonant letter, the software that renders the Indic scripts must be able to reorder elements in mapping from the logical (character) store to the presentational (glyph) rendering. For example, if Cn denotes the nominal form of consonant C, and Vvs denotes a left-side dependent vowel sign form of vowel V, then a reordering of glyphs with respect to encoded characters occurs as just shown.

Rule 15. When the dependent vowel Ivs is used to override the inherent vowel of a syllable, it is always written to the extreme left of the orthographic syllable. If the orthographic syllable contains a consonant cluster, then this vowel is always depicted to the left of that cluster. For example:


Image captured from 150% size of PDF page.

Sample Half-Forms. Table 9-1 shows examples of half-consonant forms that are commonly used with the Devanagari script. These forms are glyphs, not characters. They may be encoded explicitly using ZERO WIDTH JOINER as shown; in normal conjunct formation, they may be used spontaneously to depict a dead consonant in combination with subsequent consonant forms.

Top | TIL home page
Linguistics - index


Image captured from 150% size of PDF page.

Sample Ligatures. Table 9-2 shows examples of conjunct ligature forms that are commonly used with the Devanagari script. These forms are glyphs, not characters. Not every writing system that employs this script uses all of these forms; in particular, many of these forms are used only in writing Sanskrit texts. Furthermore, individual fonts may provide fewer or more ligature forms than are depicted here.

Image captured from 150% size of PDF page. This table appeared across pages 229 and 230 in the original pdf file.

Sample Half-Ligature Forms. In addition to half-form glyphs of individual consonants, half-forms are used to depict conjunct ligature forms. A sample of such forms is shown in Table 9-3. These forms are glyphs, not characters. They may be encoded explicitly using as shown; in normal conjunct formation, they may be used spontaneously to depict a conjunct ligature in combination with subsequent consonant forms.


Image captured from 150% size of PDF page.

Language-Specific Allographs. In Marathi and some South Indian orthographies, variant glyphs are preferred for U0932 ल DEVANAGARI LETTER LA and U0936 श DEVANAGARI LETTER SHA, as shown in Figure 9-9. Marathi also makes use of the eyelash form of the letter RA, as discussed previously in rule R5.


Image captured from 150% size of PDF page.

Top | TIL home page
Linguistics - index

Combining Marks

Devanagari and other Indic scripts have a number of combining marks that could be considered diacritic. One class of these marks, known as bindus, is represented by U0901 ँ DEVANAGARI SIGN CHANDRABINDU and U0902 ं DEVANAGARI SIGN ANUSVARA. These marks indicate nasalization or final nasal closure of a syllable. U093C ़ DEVANAGARI SIGN NUKTA is a true diacritic. It is used to extend the basic set of consonant letters by modifying them (with a subscript dot in Devanagari) to create new letters. U0951 ॑.. U0954 ॔ are a set of combining marks used in transcription of Sanskrit texts.

UKT: Anusvara corresponds to Myanmar {thé:thé:tin} (literary meaning: small one on top), nukta corresponds to {auk-ka.hmyin.} (literary meaning: raised from below).

Top | TIL home page
Linguistics - index

Digits

Each Indic script has a distinct set of digits appropriate to that script. These digits may or may not be used in ordinary text in that script. European digits have displaced the Indic script forms in modern usage in many of the scripts. Some Indic scripts - notably Tamil - lack a distinct digit for zero.

Top | TIL home page
Linguistics - index

Punctuation and Symbols

U0964 । DEVANAGARI DANDA is similar to a full stop. Corresponding forms occur in many other Indic scripts. U0965 DEVANAGARI DOUBLE DANDA marks the end of a verse in traditional texts. U0970 DEVANAGARI ABBREVIATION SIGN appears after letters or combinations.

UKT: Danda and double danda correspond to Myanmar {poad~hprat} (colloquially {poad hti:}) and {poad~ma.} .

Many modern languages written in the Devanagari script intersperse punctuation derived from the Latin script. Thus U002C , COMMA and U002E . FULL STOP are freely used in writing Hindi, and the danda is usually restricted to more traditional texts.


Encoding Structure. The Unicode Standard organizes the nine principal Indic scripts in blocks of 128 encoding points each. The first six columns in each script are isomorphic with the ISCII-1988 encoding, except that the last 11 positions (U0955 .. U095F in Devanagari, for example), which are unassigned or undefined in ISCII-1988, are used in the Unicode encoding.

The seventh column in each of these scripts, along with the last 11 positions in the sixth column, represent additional character assignments in the Unicode Standard that are matched across all nine scripts. For example, positions U+xx66 ... U+xx6F and U+xxE6 ... U+xxEF code the Indic script digits for each script.

The eighth column for each script is reserved for script-specific additions that do not correspond from one Indic script to the next.


Other Languages. Sindhi makes use of U0974 DEVANAGARI LETTER SHORT YA. Several implosive consonants in Sindhi are realized as combinations with nukta and U0952 ॒   DEVANAGARI STRESS SIGN ANUDATA . Konkani makes use of additional sounds that can be made with combinations such as U091A च DEVANAGARI LETTER CA plus U093C ़ DEVANAGARI SIGN NUKTA and U091F ट DEVANAGARI LETTER TTA plus U0949 ॉ DEVANAGARI VOWEL SIGN CANDRA O.

Top | TIL home page
Linguistics - index
End of TIL file