Update: 2005-04-05 02:02 PM -0400

TIL

Representing Myanmar in Unicode

Details and Examples

Martin Hosken 1 and Maung Tuntunlwin 2 , www.unicode.org/notes/tn11/myanmar_uni.pdf

Original pdf file rendered into html and edited by U Kyaw Tun, M.S. (I.P.S.T., U.S.A.). Not for sale. Prepared for students of TIL Computing and Language Center, Yangon, MYANMAR.

UKT: I have followed the original format as much as possible without changing the text. In place of Myanmar characters, I have used gif glyphs which I have formed, so that you do not have to have any Myanmar fonts. However if you would like to respond to me in Myanmar script, you should use WinInnwa font. If it is not readily available please contact me: jtun@sympatico.ca .

Contents
Introduction
Basic Myanmar
U+1031 -e vowel
Medials and Syllable Chaining
Devoweliser

Top | TIL home page
Linguistics - index
Representing Myanmar in Unicode | Page 1/8

Introduction

One of the first reactions people often have when seeing the Myanmar script block in the Unicode standard is to say that “it doesn’t work!” After all, there seem to be many characters missing. Where are all the medials? Unfortunately, people often give up at that point and do not bother to investigate further.

UKT: The meaning of the linguistic term "medial" from http://faculty.washington.edu/zhandel/dissB-text.pdf :
Definition of “medial” -- "The use of the term “medial” in historical Chinese phonology has its origin in traditional analyses of Mandarin phonology. “Medial” is a translation of the Chinese term jičyďn ‘intermediary sound’, so called because of its position between the initial consonant (or syllable onset) and the main vowel. In modern Mandarin the medial element may be one of three on-glides (or semi-vowels) j, w, ɥ . For Old and Middle Chinese, however, the term “medial” may not be so simply defined, in part because the syllable structure is not as simple as in modern Mandarin, and in part because some aspects of that structure remain controversial."

The problem is that people often approach the Unicode standard with a glyph model in mind. This is particularly true for Myanmar where existing fonts follow a glyph model very closely. But Unicode follows a linguistic model whereby the stored text represents the underlying characters rather than the glyphs involved. Thus, there are no separate codes for medials since a medial is simply a consonant following a primary consonant that has been devowelised. fn03

This paper aims to show that the Unicode specification for Myanmar does in fact ‘work’. In so doing, it will attempt to address some of the loose edges in the specification with regard to some of the more obscure areas of the Myanmar orthography for which there is no clear direction in the existing Unicode standard. fn04

Even with the glyph model, there are issues with implementation. Using a linguistic model introduces different implementation issues and these will be discussed. But it must be born in mind that while the Unicode standard endeavours to be implementable, it does not claim that complexity of implementation was a dominant factor in resolving encoding issues. This means that we are not primarily concerned with keying order or whether the encoding makes rendering easy, so long as keying and rendering are ultimately possible.

Another fundamental principle of the Unicode standard is that once something is encoded it will not be removed or changed. fn05  This is important otherwise a later version of a standard could break what is currently legal data. The need to update existing data to conform to a new version of the standard is not an option due to the immense problems it would cause for the computing industry. Therefore, the existing specification of the Myanmar script will stand. Only if it can be shown that the Unicode standard cannot successfully store Myanmar text, will any consideration be made of changing the existing standard.

It is hoped, therefore, that this paper will provide useful information for those wishing to implement Myanmar script using Unicode.

Top | TIL home page
Linguistics - index
Representing Myanmar in Unicode

Basic Myanmar

The basic consonants and vowels are relatively obvious in how they are encoded. Thus:

  {sa}   U1005   U102C   <letter>

Top | TIL home page
Linguistics - index
Representing Myanmar in Unicode

Here we show the Myanmar word, the underlying Unicode codes that would be stored to represent this and an English gloss of the word. As this example shows, characters are stored in the order in which they are read.

  {hka}   U1001   U102C   <to shake>

UKT: The rendering of this word can be illustrated with the corresponding syllable in Devanagari:

U0916                 + U093E                       -->  खा
Devanagari letter kha + Devanagari vowel sign Aa  -->

or is a vowel sign known in Myanmar as {maukcha.} corresponding to Devanagari vowel-sign Aa U093E (the name given in Windows XP character map). {maukcha.} is used with en-glyphs, whereas its equivalent {weikcha.} is used with em-glyphs. e.g.:

{ka.} + {weikcha.} →  {ka}

corresponding to:

U0915 + U093E → का

In this example, we highlight the code of interest. Notice how the has the same code as the (U+102C MYANMAR VOWEL SIGN AA) and that it is up to the rendering system to decide which form of the character is to be displayed. The same goes for diacritics. There is only one code for a particular character and it is up to the rendering system to ensure that the diacritic is appropriately placed.

  U100A   U102F   U102D   <brown>

  U1011   U102F  U1036  U1038  <to tie>

Here the two forms of (U+102F MYANMAR VOWEL SIGN U) and are decided by the rendering system.

UKT: corresponding to Devanagari U0941 has only one form when used alone with a consonantal-akshara most of the time. However, when it is used in conjunction with another vowel-sign it undergoes a change. You notice it happening in where it has the form .

Top | TIL home page
Linguistics - index
Representing Myanmar in Unicode

U+1031 –e vowel

UKT: The authors fail to differentiate between a vowel (or vowel-letter) and a vowel sign. The vowel in question is Myanmar vowel-letter {tha.ra. ak~hka.ra} and it corresponds to Bengali vowel-letter E U098F. Like all other Brahmi-derived scripts (more properly Asoka script), Myanmar vowel-letters have their respective vowel-signs. Please note that vowel-signs have no sound of their own and are not vowels.
   Bengali vowel-letter E এ U098F has its corresponding vowel-sign ে U09C7 which is rendered to the left of the consonantal-akshara.
   However Devanagari vowel-sign is E े U0947 is rendered above the consonantal-akshara.
   Myanmar vowel-letter E is similar to Bengali vowel-letter E and it rendered to the left of the consonantal-akshara.

We will see later why the vowels are stored in this relative order. But for now it is important to note that the Unicode standard states that vowels are stored after the consonant, according to how they are read, regardless of where they are rendered. This introduces one of the complexities of implementing Myanmar script:

   U1014   U1031    <the sun>

   U1014   U1031   U102C   <plentiful>

UKT: Indic syllables corresponding to
Bengali: U09C7 + U09A8 → েন
Devanagari: U0928 + U0947 → ने

The vowel is rendered in front of the consonant that it is read (and so stored) following. Notice that this says nothing about the relative order for typing, but it does mean that anyone implementing the Myanmar script needs to take special care of this character. In general people are used to and want to type the a vowel in front of the consonant, and so implementors need to address issues of keyboarding as well as rendering.

Top | TIL home page
Linguistics - index
Representing Myanmar in Unicode

Medials and Syllable Chaining

So much for what is clearly visible on the Unicode chart. What about all those glyphs that are not there? How are words including medials or involve syllable chaining stored?

   U1015   U1010   U1039   U1010   U102C   <hinge>

   U1016   U1039   U101A   U102C   U1038   <fever>

  U1000   U1039   U101B   U1031   U1038   <grime>
          UKT: wrong meaning, substitute <copper> or <brass>

   U1019   U1039   U101D   U1031   U1038   <give birth>

  U1019   U1039   U101F   U102F   <regard important>
          UKT: wrong meaning, substitute  <principle> as in "in principle"

UKT: is a consonant-conjunct and it cannot be pronounced. To give it sound it must be preceded by an akshara (i.e. one with inherent vowel a.)
The following illustration is from Devanagari:

KAn + virama + KAn K.KAn
U0915   U094D   U0915 U2192  
+ + क्क

According to Mahdu Pandit, क्क cannot be pronounced. However, if a suitable akshara (e.g. U0924 [ta]) is added it can be pronounced: तक्क  (syllable with no meaning). A corresponding situation would be:

  + +   cannot be pronounced.
  {ka.} + virama + {ka.} {kka.}    

In Myanmar if is preceded by a consonantal-akshara such as , we can a syllable: which is part of the word meaning <university>.

Conjuncts are known in Sanskrit as: Samyuktakshar

In the linguistic model, a medial is formed by devowelising the inherent vowel of the preceding consonant. Likewise, for a syllable chained letter, the inherent vowel at the end of the previous syllable is devowelised. In Unicode this devowelising process is marked using the virama code (U+1039 MYANMAR SIGN VIRAMA) . {athut}

Thus we store a consonant followed by the virama and then follow it with the consonant of interest.

Top | TIL home page
Linguistics - index
Representing Myanmar in Unicode

Devoweliser

There are two ways of representing the devowelising process. The first is by creating a medial or syllable chained form, using U+1039 to mark the devowelising. The second is to use the visible virama character ( {athut} in conjunction with a base consonant. But if U+1039 is being used to mark medials and syllable chaining, how is the visible character to be represented? The Unicode standard gives the answer. The sequence U+1039

Top | TIL home page
Linguistics - index
Representing Myanmar in Unicode | page 2/8

UKT: to be continued.

 

Footnotes

fn01 SIL International and Payap University, Chiang Mai, THAILAND fn01b

fn02 Myanmar World Distribution fn02b

fn03 Also known as a consonant combination symbol. fn03b

fn04 This paper is based on the Unicode standard as it stands at version 4.0 fn04b

fn05 The only option open to the Unicode Consortium to fix encoding problems is to encode a new character with the right properties and to deprecate the use of the old character. fn05b

Top | TIL home page
Linguistics - index
Representing Myanmar in Unicode
End of TIL file