intro-voc3.htm
by U Kyaw Tun, M.S. (I.P.S.T., U.S.A.). Not for sale. Prepared for students of TIL Computing and Language Center, Yangon, MYANMAR.
UKT: Based on
• Properties of Consonants and Vowels, Kevin Russell, Linguistics Department, University of Manitoba, Winnipeg, Manitoba, R3T 5V5, CANADA http://www.umanitoba.ca/faculties/arts/linguistics/russell/138/notes.htm. 071221
• Online Phonetics Course (UNIL), Department of Linguistics, University of Lausanne, Switzerland.
(This source was downloaded in 2000 or a few years later, and instead of the original links, you can still get to them from: http://www.unil.ch/ling/page30184_fr.html -- UKT: 070823)
How is sound produced?
Sound waves -- listen to
<)) 300 Hz etc (online:
<)) 300 Hz)
Spectrum diagrams
Source-filter model of speech production
Harmonics
Resonance |
Resonance in a half-open tube
Source and filter |
Same vowel at different pitches
Formants |
Canadian-English vowels
IPA vowel diagram and cardinal vowels
Relating formants to articulation
Passages worthy of note:
UKT notes
• British-English vowels
• lax and tense vowels
• pitch and frequency
• Source-Filter Theory
Temporary index of figures (in order of insertion) (Does not include figures in
my notes.):
• Fig 8.5 -- Fig.5.01
• animations are not given numbers
• Fig.8.2 Sine wave -- Fig.5.02
• Fig. Sine waves compared -- Fig.5.03
• Adding 2 sine waves -- Fig.5.04
• Wave form and Spectrum -- Fig.5.05
• Resonance in half-open tube -- Fig.5.06
• Odd quarters law -- Fig.5.07
• Frequency response -- Fig.5.08
• Glottal wave -- Fig.5.09
• Schwa spectrum -- Fig.5.10
• Same vowel at different pitches /a/-- Fig.5.11
• Same vowel at different pitches /i/ -- Fig.5.12
• Fig 7.9 Formants -- Fig.5.13
• Spectrum of free vowels -- Fig.5.14
• Spectrum of checked vowels -- Fig.5.15
• EGG electrodes -- Fig.5.16
• Laryngograph processor -- Fig.5.17
• Fig 15 Audio and EGG signal -- Fig.5.18
• German and Russian monophthongal vowels -- Fig.5.19
• F2/F1 vowel space -- Fig. 5.20
• Bangla vowels -- Fig.5.21
This chapter is about sound waves, and you might say, " Why should we be bothered with sound waves? We are speaking about Burmese-Myanmar language, aren't we?" Yes, we are. But we will get bogged down when we describe consonants in phonetic terms according to where they take place (POA or Point of articulation), how they are articulated (‘manner’) and whether the vocal folds are vibrating (‘voicing’). This three-way description of consonants is often known as the VPM (voice-place-manner) description, and it can be used to describe consonants in any human language. There are other features of articulation that may also be relevant in particular cases (for example aspiration -- which is looked upon differently in Burmese-Myanmar as {ha.hto:} formation and in English), but in general, the VPM description is sufficient to characterise the different consonants of a language.
We will come into more misunderstandings
when we look into the pronunciation of vowels. There are
two languages involved in our discussion: Burmese-Myanmar
and English-Latin. Then we will have to include Hindi-Devanagari because of our
interest in comparing Pali-Myanmar to Pali-Latin and Pali-Devanagari. Burmese,
English and Hindi are living languages and all are
changing across time, and across geographical space.
We became confused with written scripts -- the abugida
and the alphabet -- and we forget that they are meant
to "record" human speech. We tend to say
"That was the Pali sound in old days"
or "the ancient pronunciations", forgetting that
none of us have ever heard with our own ears what we are talking about.
We forget that the machines to record the sounds and pronunciations
have not been invented until recently. We forget that
the Burmese-Myanmar word for 'Grammar' came from the Pali-Myanmar word
{þûd~da.} meaning "sound". So, unless we start
from sound, there would never be an end to our mis-interpretations
and mis-understandings. But then, people like my good friend U Tun Tint
of the MLC would say something like, "The positions of the places in the mouth where
these sounds are produced are so definitely describe that we know what Pali
sounds are like." That is the position of many linguists and phoneticians who
have not realized that where the vowels are produced are deep down in the throat
where direct observation has not been possible up to about 40 years ago, and
that much of our knowledge has come from the field of surgery of the larynx and
its rehabilitation. The results of the recent research, described under the
umbrella term 'Voice Quality' (VQ), is being continually applied to computer
voice recognition and voice production systems.
Though this chapter is based on mostly on the work of Kevin Russell, University of Manitoba, which I have downloaded more than 5 years ago, and which has been accessed quite often, I have included materials from many other sources, such as Lesley Jeffries' Discovering Language: The Structure of Modern English, Palgrave Macmillan, 2006. (See TIL mini-library included in the CD version which will be made available to my collaborators for research purpose only.) I have also inserted my commentaries and interpretations.
Sound energy is transmitted through air as the medium.
Of course, sound can be transmitted across water, and
even across metal wire and cotton thread as every
school boy of my age knows.
I am thinking about two tin cans with a stretched string
in between which we used as "telephone".
Here, we are talking about sound being transmitted through
air which becomes rarified and compressed as the sound energy
is being carried across.
The simplest laboratory instrument we can use to produce sound is the tuning fork. The pix on the right, Fig.8.5 (the numbering in the original paper), shows the rarefaction and compression pockets of air molecules as sound produced by a tuning fork is carried across. The following animation from http://www.glenbrook.k12.il.us/GBSSCI/PHYS/mmedia/waves/harm4.html.
Remember, the air molecules do not travel. It is the
"pattern" that is traveling.
Rarefaction and compression sections (produced in a 'slinky"
travelling from left to right in what is known as a longitudinal wave.
Sound waves are longitudinal waves.
For mathematical analysis, we use another figure or graph, Fig.8.2, as is shown on the left. The rarefaction and compression pattern quantified by air-pressure is shown by the Y-axis, and time by the X-axis.
This simplest kind of pressure wave is called a sine wave.
Interesting things to measure for a sine wave (Fig. 8.2):
1. amplitude (or loudness, size of pressure differences)
usually measured in decibels (dB)
2. frequency (or pitch)
usually measured in cycles per second, or Hertz (Hz)
(Note: wavelength is the "reciprocal"
of frequency, and is not usually given. Sound is generally described in terms
of frequency and not wave length.)
The word 'pitch' must always be used with caution. It is a perceived quality, whereas the frequency is a measurable quantity. See pitch and frequency in my notes.
In a particular medium (such as as air), all sound waves travel at the same "speed" or velocity which is about 330 meters per second.
Fig. 8.2. is for a particular sound with a particular quality.
If you were to change either the amplitude or the time,
you will get other sounds with other qualities. See figures Sine waves
compared a and b.
When we say "sound being carried across the air medium", what we are actually talking about is a form of energy (call it sound energy) being transmitted across space. Because, sound is energy, various sound waves can be combined to produce other sounds with other qualities. The result is the simple sine waves become more complex. In figure on the left, we are showing the result of adding waves 1 and 2 (red) to produce a resultant wave (blue). Adding sound waves is easy. But given the blue resultant wave, can we split it up into the original red waves? It is not easy and it calls for a "complex" mathematical treatment.
First, let's listen to various sound waves from Kevin Russell's website, from which I have downloaded the following on 071221. Go online and click on the sound buttons given.
They can be added together:

to produce a complex wave:
<)) 300Hz + 500Hz
This is important because: Any complex wave can be treated as a combination of simple sine waves.
We usually don't care about the actual complex wave itself. We're only interested in the frequencies and amplitudes of the simple waves that it's made up of. Two more examples:
<))
300 Hz and 2000 Hz added (fake [ i ])
<))
900 Hz and 1100 Hz added (fake [ a ])
The physical scientists like me would like to split up the human voice, in the form of sound waves, into more simple waves with quantities which we can measure. And then we would be able to describe the sound quality in terms of measurable quantities instead of talking about POAs and manners. And then we can forget about the phoneticians and their subjects and their L1s (the very first language the person as a new born is exposed to), and their vowel diagrams. However in actual practice the matter is not as simple as you would like it to be.
UKT: My curiosity to know more than what Kelvin Russell has given, has led me to: http://www.sjsu.edu/faculty/fry/123/acoustics.pdf 071222
Complex waves can be split up into the simpler waves
that make them up. Let's suppose, we are to split up
a complex wave (black) shown on the right which have been
formed from 3 simple waves, the red, the blue and the green.
And let's say, we have managed in one way or another, using
mathematics, able to split it into its components. Now, we can describe
the 3 simple waves in terms of their amplitudes. From their
amplitudes and frequencies, we can draw a spectrum, where the Y-axis
is the amplitude in decibels, and X-axis is the frequency in Hertz, Hz.
This kind of spectrum diagram is especially convenient for sound waves from a musical instrument, such as a flute. We have reproduced here a flute playing the middle C note.
[{UKT: Don't ask me what a "middle C note" is. I simply don't know. You will have to ask a music teacher. Since, the Burmese musical scale is different from the Western scale, make sure that your music teacher knows the Western scale.}]
The wave is made up of many, many simple waves as shown in the spectrum on the left. I hope you can hear what it sounds like by going online and clicking on A flute playing middle C: <)) (online: <)) latest check 071221). The online sound link is to Kevin Russell's website, and unless you are online, you will not hear it.
(Sounds and plots for the musical instruments come from Geoffrey Sandell's SHARC Timbre database at Loyola University Chicago.)
Don't confuse these spectrum diagrams
with spectrograms (which we'll cover later). Perhaps, we are more familiar with
light spectrum, because we are used to seeing rainbows. The coloured diagram
shows how white light is split by a glass prism into respective colours. There
is only one major difference between sound waves and light waves. The sounds
waves are longitudinal waves, whereas light waves are transverse waves.
Mathematical treatment of the waves are the same. The set of frequencies in
light wave (as separated by a prism) is called its spectrum.
The situation is similar with sound. The complex wave for
[ i ] vowel sound will be made up of one set of frequencies
which are different for the set of frequencies for vowel
[a].
We need a way to separate a complex sound wave out into its component frequencies (and their amplitudes) so that we can see what makes vowels different.
Sound [of the language] is produced in the larynx.
That is where the pitch and volume
are manipulated. The strength of expiration
from the lungs also contributes to loudness, and is necessary
for the vocal folds to produce speech.
-- Wikipedia
http://en.wikipedia.org/wiki/Larynx 070909
The complex waves produced during voiced periods
of speech depend on two things:
1. the waves produced by the vocal fold vibrations (the source), and
2. the way those waves are modified by the higher parts of the vocal tract
(the filter).
An important feature of the source are its harmonics. One of the most important ideas in understanding the filter is resonance.
We shall describe the Source-Filter Theory in the following steps:
• Harmonics
• Resonance |
Resonance in a half-open tube
• Source and filter
Consider again the waves produced by the flute. The lines in these spectra look suspiciously evenly-spaced. This is a typical property of naturally occurring waves.
Now, consider a tightly stretched guitar string. Pluck it in the middle. The middle portion can vibrate up and down. Remember that its ends are tied, and so the travelling-wave would travel to one end (say the right end), and then reflected from that end to the other end (the left end), and then reflected again. The string can vibrate in more than one way. Here we have shown only the first three. In the first, we see only half of a wave (1/2 which we will call the 1st harmonic); in the second, a full wave (2/2 - the 2nd harmonic); and in the third, one and a half wave (3/2 - 3rd harmonic). In fact, the string can vibrate in an infinite number of ways (n/2). The points where the string does not vibrate are the nodes. Between the nodes are the anti-nodes. This kind of a wave is known as a standing wave.
Naturally occurring waves in a vibrating string involve all n-kinds of vibration simultaneously. The lowest frequency of the wave is its fundamental frequency, the rest have higher frequencies.
In naturally occurring vibrations, there is a harmonic at each multiple of the fundamental frequency -- theoretically all the way up to infinity, though the harmonics decrease in amplitude as the frequency rises.
The following animations are from: http://www.glenbrook.k12.il.us/GBSSCI/PHYS/mmedia/waves/harm4.html.
On the left is an animation of exaggerated motion
of the fourth harmonic of a standing wave.
The formation of the standing wave can be thought of as a wave traveling from the left end to the right end, and then reflected back to the left end. |
![]() |
| The upper-right animation shows a red crest traveling from left end of the tied string to right end. At the right end, it is reflected becoming a blue trough, and the trough travels back to the left-end. On the bottom right is a pix showing the formation of a node in the middle. (This is my creation using two of the upper-right animations.) |
|

The above are animations of the harmonics of a standing wave:
1st, 2nd and 3rd -- all from the same source.
The spectrum of the wave produced by the guitar string would look like the diagram given on the right:
The intensity of the higher harmonics is greatly reduced, and they may be ignored in a discussion.
The wave produced by the vibration of the vocal folds (cords) also has this kind of structure. It is often called the glottal wave. The fundamental frequency (the frequency of the lowest simple wave) is perceived as the pitch.
Objects have frequencies that they prefer to vibrate at. If you try vibrate it at a different frequency, the vibrations will be dampened and eventually die out. If you try to vibrate it at its preferred frequency, the vibrations will be reinforced and the object will resonate.
Some examples of resonance:
• a standing wave on a skipping rope.
• the note you get when you blow into a half-full glass bottle
• the vibrations in the sounding board of a violin
• a swing swinging higher when you push it just right, or you "pump" just right while you're sitting in it
As children learning physics, we were often asked when we come to 'resonance': "Why do soldiers break step in marching over a bridge?" I am sure, you know the answer. If you don't, here it is:
"To avoid stressing the bridge excessively... if they march in step, there's a chance that their steps will coincide with the resonant frequency of the bridge and cause possibly dangerous amplified shaking of the whole structure."
The most famous case of a bridge collapse (due to resonance caused by the wind), that of the Tacoma Narrows bridge (nicknamed Galloping Gertie) in 1940, has been recorded on film. Various videos are available online. Use the search string "Tacoma Narrows bridge" to find one. (The most recent one I have watched was on 080311.)
A tube that vibrates at one end and is open at the other (e.g., a clarinet, the vocal tract) also has preferred frequencies.
You can get a standing wave in a half-open tube if the area of high-pressure reaches the open end at exactly the same time the closed end returns to normal pressure.
When this happens, the "reflected" waves travelling back from the open end will exactly coincide with the waves travelling forward from the closed end and they will reinforce each other. The tube will resonate. (At a non-preferred frequency the backward-moving waves will sometimes reinforce, sometimes cancel out, the forward-moving waves, and you won't get a standing wave.)
The preferred frequencies for a half-open tube will be
all those frequencies (call them X) such that: the length
of the tube is 1/4 the wavelength of X, or the length of the tube is 3/4 the
wavelength of X, or the length of the tube is 5/4 the wavelength of X, and so
on. (This is often called the "odd-quarters law".) This means the second
resonating frequency will be three times higher than the first,
the next will be five times higher, and so on.
For a half-open tube that is 17 cm
long (a typical length for an adult male's vocal tract), the preferred
frequencies are 500 Hz, 1500 Hz, 2500 Hz, 3500 Hz, and so on.
We often diagram the frequency response curve of a tube. This shows for each frequency how much a tube would resonate if you gave it vibrations at that frequency. The frequency response curve for a 17 cm long vocal tract held in neutral position (i.e., the position for schwa) looks like:
The frequency response curve shows how the vocal tract in neutral position would respond if you gave it various frequencies:
See
• Source-Filter Theory of speech production in my notes.
You might be asking what exactly is the source, and what are the filters.
The source is the human-voice producing air-stream coming out of the glottis. It can be laminar flow or turbulent flow. For simplicity sake, it has vibration signals which we have been calling frequency.
The filters are to be found in the oral- and nasal-tracks. They will modify the sound signal.
Here we will take vowels schwa /ə/ to describe the model.
The frequency response curve shows how the vocal tract in neutral position would respond if you gave it various frequencies (remember it is the air coming out of the glottis).
The spectrum of the glottal wave (source) shows
what frequencies you're actually giving it:
Putting these together gives you the spectrum of the wave that comes out of the mouth (filter - tongue in neutral position) for a schwa /ə/:
The two aspects of the source/filter
model are independent of each other.
• You can speak different vowels with the same pitch.
(The harmonics will remain the same distance apart,
but the bumps will be in different places.)
• You can speak the same vowel with different pitches.
(The bumps and the overall shape of the spectrum remain the same,
but the harmonics will be spaced differently, as shown below.
The "schwa spectrum" is the one your ear will hear,
when someone else is "singing" /ə/.
We will take two examples, one from Hyperphysics
http://hyperphysics.phy-astr.gsu.edu/Hbase/music/vowel2.html#c1 080103, for
American vowel <a> as in <father> (US <a> is similar to
{a}; whereas British <a> is similar to
{au}); and Canadian <i> from Kevin Russell ,
http://www.umanitoba.ca/faculties/arts/linguistics/russell/138/sec4/diffpich.htm
080103
The vowel /a/
To explain how the ear can recognize a vowel sound as the same vowel, even though it is sounded at different pitches, the idea of vocal formants is invoked. This is data from Benade showing that an "Ah" vowel {a} involves a similar envelope of harmonics when sounded at different frequencies.
Stemple, et al., report a mean fundamental frequency for male voices of 106 Hz with a range from 77 Hz to 482 Hz. For female voices the mean was 193 Hz with a range from 137 Hz to 634 Hz. These averages were based on the production of a sustained vowel /a/ .
The vowel /i/
Now, let's take the most prominent vowel, the front vowel / i / corresponding to
{ i } (tone #2). The source is the same glottal wave but at different frequencies.
The 7 diagrams on the right are computer-generated
spectrum diagrams of the vowel, /i/, you will hear.
Each shows the vowel / i /
(
{i}) sung at successively higher pitches.
Note how the distance between the harmonics increases as the pitch does, but the preferred resonating frequencies stay the same. The pitch has no effect on the preferred resonating frequencies of the vowel, though we can see that if the pitch gets high enough and the harmonics far enough apart, it can become very difficult to tell where the "bumps" are.
UKT: No matter how the pitch changes, you can still recognize it as the same
vowel /i/ or
{ i }.
For the purposes of distinguishing vowels from each other, we are more interested in the frequency response curves (indicating the preferred resonating frequencies of the vocal tract) rather than in the raw spectrum of the wave.
Each of the preferred resonating frequencies of the vocal tract (each bump in the frequency response curve) is known as a formant. They are usually referred to as F1, F2, F3, etc. For example, the formants for a typical adult male saying a schwa:
F1, first formant -- 500 Hz
F2, second formant -- 1500 Hz
F3, third formant -- 2500 Hz
...
By changing the vocal tract away from a perfect tube, you can change the frequencies that it prefers to vibrate at. That is, by moving around your tongue body and your lips, you can change the position of the formants.
Formants can be used to differentiate the vowels such as
{o} and
{au:}. These
two vowels are of interest to my friend U Tun Tint and me, because the Burmese-Myanmar
{au:} has
been transcribed by MLC as /[o]/ after the fashion of Pali-Myanmar
{AU:} transcribed in International Pali. MLC on the other hand transcribes
{o} as /[ou]/.
When I told him that in Romabama, the transliteration for
is {o}, he said "that's how a man on the street {lam:pau-ka. lu}
would do it." And he is right! It is usual for male Burmese friends
of the same age to address each other using the prefix {ko}
(such as how I address him -- {ko htwan: ting.}). If I were to write
to him in English, I would address him as Ko Tun Tint.
The explanation for how this confusion had come about is on the way the English
vowels [o] and [ɑ] are generally pronounced. The first three
formants for [o] and [ɑ] are quite similar,
and when we pronounce
{au:} or
{AU:}, foreigners might heard it as [o]. But to us, they sound as
/ɑ/, and hence the Romabama transcription is {au:}.

Dictionaries usually tell us how to pronounce a word,
a vowel or a consonant. For instance, DJPD16 on the inside
of the first cover states, on British accent, "e
as in 'pet'; æ as in 'pat'. On American accent,
it gives the same statements. Obviously, the authors
have in mind that the reader would be a either a British
or American person who knows what RP (Received Pronunciation)
and/or Standard American accent are.
Though I am bilingual, I am neither British-born nor American-born
and I am at a loss to what the dictionary meant. I know how
to pronounce 'pet' and 'pat' in English as is spoken in Myanmar.
Shall we call it the Burmese-English? We pronounce the words in the same way:
the result is, I don't know how to differentiate the two.
I am, what might call "phoneme deaf": parallel
to a 'colour-blind' person who could not differentiate,
the 'red-yellow-green' of the traffic light. When the top light
lights up, even though it appears a kind of gray,
he "knows" the colour is what others call
'red' and he has to stop. When the bottom light lights up,
he "knows", the colour is 'green', and he can go.
So my recourse is to rely on vowel diagrams and consonants charts,
and use my knowledge of Burmese to pronounce the English
words. Luckily, Burmese-Myanmar is based on phonemic principles,
and I can use it as a phonetic-language even though
I did not know what IPA was. So, let me give you once again
the vowel charts: on the right, the vowel quadrilateral
of Daniel Jones, and on the left, the vowel rectangle
of the American tradition.
Before we proceed, I would like to remind you, what I have found so far on the equivalence of Burmese to English vowels:
• {a} = /a/, /æ/ and /ə/
• {i} = /i/
• {u} = /u/
• {au} = /ɔ/ and /ɑ/.
Please note that Romabama {o} is /o/ -- not the [o] of Pali-Latin (International Pali). What I have found should be checked with the value of F2/F1, when these became available.
In the rectangular vowel diagram, I have marked out the
Lax and Tense vowels. To be
in conformity with Romabama requirements, I have changed the terms "lax" with
"checked", and "tense" with "free".
The checked vowels (e.g. /ɪ/ in <bit> /bɪt/ --
proposed transliteration
{bít} (rhyming with
{hkít} -- meaning: "times, era" MEDict064 are inside the red rectangle,
and the free vowels (e.g. /i/ in <beat> /biːt/
{bi:t}) outside it. You might also
note that the "close" of the quadrilateral is
the "high" of the rectangular. And, the German linguistics
call tense-lax distinction fortis and lenis.
See what the DJPD16 Info-panel 38 has to say. No wonder, I was confused! For use
in Romabama, neither pair (Lax-tense) nor (fortis-lenis) makes sense. Instead, I
prefer to use checked vowels (vowels followed by {a.that} consonants), and free
vowels. Since checked vowels are the equivalents of Russell's lax vowels,
and free vowels the equivalents of tense vowels,
I have to changed Russell's terms to Romabama's.
DJPD16-310 Info-panel 38: It is mainly American phonologists who use the terms lax and tense in describing English vowels; the short vowels /ɪ e æ ʌ ɒ ʊ ə/ are classed as lax, while what are referred to in our description of BBC pronunciation as the long vowels and the diphthongs are tense. The terms can also be used of consonants as equivalent to FORTIS (tense) and LENIS (lax), though this is not commonly done in present-day descriptions.
[ i ] in <beat> /biːt/ DJPD16-052; approximating
{bi:t}
[ɪ] in <bit> /bɪt/ -- DJPD16-060; approximating{bít} (from the spelling of
{hkít} - MEDict064)
Note: Spellings like
{hkít} /kʰɪt/ (meaning "age, time, period", and
{þít~ta} (meaning "box") are becoming rare. These should be revived because, the substitute for
{hkít} as
{hkic} /kʰɪc/ would give another meaning. There is also the possibility of mis-spelling
in Romabama as {hkis} which would imply the /s/ sound at the syllable end. To prevent it, you might be tempted to use a double killed consonant in the coda which is against the Burmese-Myanmar phonotactics.
{hkist} could also give another untenable implication.
Spectrum of free (tense) vowels (right):
http://www.umanitoba.ca/faculties/arts/linguistics/russell/138/sec4/formants.htm
080103
You will notice that I have included /æ/ in the "free vowels". This is not exactly right because it is usually followed by a consonant, and is therefore a checked vowel. The reason why I am doing it is because, I could get Canadian /a/.
Each of the figures on the upper-right shows a computer-generated spectrum and response curve for a particular utterance of a Canadian English vowel by an adult male. The jagged lines show the harmonics. The curved line is the computer's guess, based on the harmonics in the spectrum, as to what the frequency response curve of the vocal tract must have been. The frequencies of the first two formants (as guessed by the computer) have been given for each vowel.
(If you have noticed that the Frequency axis for /o/ is 0 to 5000 Hz, please be assured that it was 0-5000 in the original given by Kevin Russell.)
For comparing vowels across languages, it is now accepted to take the F2/F1 of a
minimum of three vowels: [a i u]. The values of F2/F1 taken from the figure on
right are:
• [æ] = 1550/860
• [i] = 2230/280
• [u] = 1260/330
Spectrum of checked (lax) vowels - left:
(checked vowels are those that are followed by consonants or
{a.that}-consonants. It is unfortunate that Russell does not
mention the consonant following his "lax" vowels.)
Kevin Russell gives a total of 11 spectra: 6 free vowels, and 4 checked vowels.
When you compare the vowels (in terms of F2/F1), the vow /ʌ/ (1310/680) seems to be out of place when I included it in the checked vowels. The vowel /u/ (1260/330), behaves very well in Burmese-Myanmar when it is not followed by a killed consonant. However, when followed by a killed consonant it changes into /ʊ/ F-400-110, and /ʌ/ F-680-1310, the pronunciation of the word changes:
<put> /pʊt/ -- DJPD16-436 compare with {pwat}
<but> /bʌt/ -- DJPD16-075 {bat}
We note that the killed {ta.} is of row 4 of the akshara. With others:
<piss> /pɪs/ -- DJPD16-413 compare with {pis}
<pack> /pæk/ -- DJPD16-392 compare with {pak}
in EGG (electroglottography) and Voice Quality 4.
Labelling of voice quality in
http://www.ims.uni-stuttgart.de/phonetik/EGG/frmst1.htm .
You will hear some samples in wav. format .
EGG is a an noninvasive method of investigating of laryngeal
behaviour, conveys essential information about glottal activity. This study
provides a objective, computer-supported method of EGG signal description which
can be used for the automatic determination of voice quality for normal and
pathological speakers and in determination of laryngeal settings used for
linguistic purposes.
Voice quality can be judged from:
• the degree of hoarseness (G ) or (H ), amount of noise in the
produced sound
• the grade of roughness (R ), in relation to the irregular fluctuation
of the fundamental frequency
• grade of breathiness (B ), the fraction of the non-modulated
turbulence noise in the produced sound
• asthenicity (A ), the overall weakness of voice
• "strained quality" (tenseness of voice,
overall muscular tension) (S )
Using the above subjective qualities in a scale of 0 to 3,
phoneticians have devised 2 systems of classification: GRBAS or RBH. GRBAS is
widely used in the US and Japan, whilst RBH is used in Europe. Listen to a voice
graded to R3B2H3 <))
UKT: The following is from II Electroglottography in http://www.ims.uni-stuttgart.de/phonetik/EGG/frmst2.htm
Electroglottography (EGG) is a technique used to
register laryngeal behavior indirectly by a measuring the change in electrical
impedance across the throat during speaking. The method was first
developed by Fabre (1957) and influential contributions are credited to Fourcin
(1971 with Abberton) and Frokjaer-Jensen (1968 with Thorvaldsen). Commercially
available devices are produced by Laryngograph Ltd., Synchrovoice and F-J Electronics.

Pix right: The Laryngograph Processor
A portable electro-laryngograph, microphone pre-amplifier, and speech or
Laryngograph based fundamental frequency ("pitch") extractor
www.laryngograph.com/pdfdocs/lxprocfsheetusb.pdf . The unit consists of:
•1.A single pair of gold plated, guard-ring electrodes and
three differently sized neck bands.
For work with either voice or swallowing
the electrodes are lightly held on the speaker’s neck, either side of the
thyroid cartilage. They enable the Processor to detect the small, relatively
rapid variations in the conductance of the tissue separating them, produced by
changes in the nature and area of vocal fold and other tissue contact. (Three
different sizes of electrodes are optionally available, for special
applications.) •2. A miniature high quality electret microphone responding to
the speech pressure waveform. •3. A power supply/battery charger.
The amplitude of the signal changes because of
permanently varying vocal fold contacts. It depends on:
• the configuration and placement of the electrodes
• the electrical contact between the electrodes and the skin
• the position of the larynx and the vocal folds within the throat
• the structure of the thyroid cartilage
• the amount and proportion of muscular, glandular
and fatty tissue around the larynx
• the distance between the electrodes.
In the beginning of our forays into phonetics and linguistics, the more my wife and I (we were both chemists by training) looked at the IPA vowel diagram with the cardinal vowels, the more we became curious of the experimental procedures that must have been carried out to place a person's vowels (i.e. vowels uttered by a particular human subject, male and female) in the diagram. How was it done by Daniel Jones? Were there experiments carried out in the field of acoustic phonetics? And would I be able to understand the mathematics involved? The physical scientist in me, wouldn't let me rest until I have at least a cursory look into what I wanted to know. Browsing the internet, I came across An IPA vowel diagram approach to analysing L1 effects on vowel production and perception, by O. I. Dioubina & H. R. Pfitzinger, Univ. of Munich, 200. www.phonetik.uni-muenchen.de/~hpt/pub/DioubinaPfitzinger_ICSLP02.pdf . 071231
The IPA vowel diagram represents an abstract space,
which in its layout and proportions is derived from the one
which had been used in the cardinal vowel system of Daniel Jones.
It is a trapezium, right angles at top and bottom back and
ratio 2:3:4 (base:back:top). This is the most simplified version
of the figure developed by Jones through a number of stages,
in which articulatory accuracy was progressively sacrificed
for practical convenience in drawing the diagram.
The vowels are plotted on the diagram with reference to certain fixed points. Daniel Jones proposed a series of 8 (primary) cardinal vowels spaced around the outside of the possible vowel area and designed to act as fixed reference for phoneticians. The space within the diagram represents a continuum of possible vowel qualities which have to be identified by their relationships to the cardinal vowels. According to Daniel Jones a scale of these 8 cardinal vowels forms a convenient basis for describing the vowels of any language.
UKT: At the present, it is accepted that only a minimum three vowels [a, i, u] (or in the case of English [æ, i, u]), known as a vowel triangle are needed for cross-language comparison. For German (of the sample), imagine a triangle being drawn across three filled dots: /i/, /a/ and /u/. For Russian, imagine another triangle being drawn across open dots for the same vowels. (Incidentally, I could not find the one for /a/.). Since the two triangles are different, we can see why a German would not able to sound like a Russian or vice versa. Remember there is no perfect way to say a vowel (such as /i/): the IPA pronunciation by an American phonetician such as Ladefoge is as 'good' as the one you say. I have come to this conclusion after listening to the "IPA pronunciations" given by different people (American, British, Canadian, Dutch, etc.). Make sure that the aim of the L2 (second language) should be able to speak English which could be understood by the other English speakers. The aim should not be to speak like the so-called native speaker, whatever the word "native" may mean.
The two languages I am interested in, are Burmese and English. I am sure English must have been studied using subjects from different English-speaking countries, but I am doubtful much has been done with Burmese-Myanmar subjects.
The description of vowel qualities with the help of the vowel diagram
requires a phonetician to be able to position them as certain points
on the diagram. The three basic dimensions, height, backness and rounding,
together with the values of cardinal vowels are involved in making
a decision on the position of the vowel quality within
the space of the diagram. Where a vowel is positioned would be bound to be
influenced by the L1 of the investigating phoneticians. Because of this, I doubt the
descriptions of the Western phoneticians on the qualities of the Burmese-Myanmar
vowels, especially when they insist that Burmese-Myanmar has diphthongs.
Kevin Russell gives the Canadian vowels from a study of formants (values, indicated by symbol F, observed from a study of sound waves and spectrum on a subject -- most probably himself.
The figure on the left is the result of "flipping" the figure given by Kevin Russell. I've redrawn it adding the blue lines. Click on the figure to see the original graph. The measurement, and the construction of F2/F1 tells us the positions of tongue of a particular speaker on a particular occasion without having to rely on the judgments of the phoneticians. Remember, the same person would pronounce his or her vowels slightly differently from time to time (depending whether he or she is suffering from a cold, etc.). This you must remember when you are speaking about the sounds of the vowels in particular, and languages in general. Yet, you as a member of a particular linguistic group, would be able to identify another member of the same group from the way he or she speaks. This ability to identity another person as the same kind or not is important for the survival of the human species.
The nearest to Burmese-Myanmar vowels I could find (on the internet, so far, 080103) are Bangla vowels presented on the right. The reader should note that the F scales of the Bangla vowels are the reverse of Russell. The Burmese-Myanmar nearest equivalents of the Bangla vowels are:
{a.} = অ ;
{a} = আ ;
{é} = এ ;
{i.} = ই ;
{u.} = উ ;
{au:} = ও
(I am waiting for input from my peers.)
You will notice that the vowel-quadrilateral can be bounded by a rectangle with F1 as the Y-axis (from 900 to 300), and F2 as the X-axis (from 2500 to 500).
Comparing the vowel quadrilateral and F2/F1 diagram shows that:
• F1 is influenced by tongue body height
• F2 is influenced by tongue body front-ness/back-ness.
It was expected that measurement of F1 and F2 would be sufficient to describe a vowel, however, for finer details especially when we are taking Burmese and English together, it is found that F3 would have to be taken into consideration. Now, let's find out what the F1's F2's and F3's are, and we will see how they are used. If you are in a hurry to know jump to the section Relating formants to articulation (on the same file). However, if you are a new comer, I would recommend that you read the following sections first.
UKT: Materials in this section should be read together with Vowel Perception and Production, by B.S. Rosner & J.B. Pickering, Oxford Psychological series 23, Oxford Science Publications, Oxford University Press, published 1997. Available (from Google bookreview) in TIL library in the CD version, for research purposes only.
The positions for the first two formants of a vowel aren't random. Let's look more closely at the formants we saw for Canadian English vowels: (UKT: the values below are from Russell: http://www.umanitoba.ca/faculties/arts/linguistics/russell/138/sec4/form2.htm 080103)
The values of F2/F1 are the same as those on the figures given in the
previous section:
• [æ] = 1550/860
• [i] = 2230/280
• [u] = 1260/330. However, they are different from those given by Wikipedia
(below). The reason is each of us pronounce our vowels slightly differently. But
we all do it within a small range, so people in one linguistic group knows
exactly what his neigbour is saying. If you travel from country to country as I
have done (Australia, Britain, Canada, and US where I met "native-English
speakers") you will find that you need a couple of days to cue in to the way the
"locals" speak. It has been observed that people of the same L1 (say natives of
the Indian subcontinent) when they speak English as their L2, the speak in the
same way. So an Indian speaking English is perfectly understood by another
Indian, but not easily by an American or a Britisher.
UKT: The following values are from: http://en.wikipedia.org/wiki/Formant download 070908
After measuring the F2/F1 of individuals' vowels, and "averaging" them,
we can place each vowel on a graph, where the horizontal dimension
represents the frequency of the first formant (F1) and
the vertical dimension represents the frequency of
the second formant (F2). See figure on right.
UKT: I've redrawn the graph adding the blue lines. Though Kevin Russell had not indicated the the formants for a typical adult male saying a schwa. I've entered it in red.
What we get is just a image similar to our familiar vowel chart!
(upper-right -click on fig. to see the downloaded original pix.)
If we change the axes of the graph so that the horizontal dimension
shows (decreasing) F2 and the vertical dimension
shows (decreasing) F1, we get something almost exactly
like our vowel chart. See figure on the left.
The figure on the left is the result of "flipping" the figure above. I've relabeled the blue lines, and the data points. The figure on the lower-right is the fig. downloaded from the source on 070907.
This means that a listener can essentially "hear"
the position of the speaker's tongue body.
• F1 is influenced by tongue body height
• F2 is influenced by tongue body frontness/backness
(An even more accurate indicator of frontness/backness
than F2 is the difference between the first two
formants, i.e., F2 - F1.).
Instead, of doing this, I have reproduced the graph (on British vowels)
from
http://www.phon.ucl.ac.uk/home/wells/formants/relamp-uni.htm
in my notes. See
British-vowels
in my notes.
As a concluding remark, after going through all the above, I would like to add:
From measurements of F2/F1, across languages, humans (Americans, Bengalis,
British, Burmese, Canadians, Germans, Russians, etc.) all pronounce the three
vowels [a (æ), i, u] slightly differently. Still we can recognize the vowels
produced. Moreover, even among the same linguistic group, men, women, and
children produce the same vowels differently, but still members of the same
linguistic group understand each other perfectly. This is because, not only the
speaker modulate their speech during production, but the hearers cue what
they hear, by sight as well. And, so our understanding of human speech is never
complete unless we look into the perception as well. But then, that would take
us into a different field of study, where we will come across another theory,
the Modulation Theory. As an introduction to the theory, please look into
Speech considered as modulated voice, by Hartmut Traunmüller ,
Department of Linguistics, Stockholm University, S-106 91 Stockholm .
http://www.ling.su.se/staff/hartmut/speech_considered.pdf (2005), parts of
which are included in the TIL library.
Abstract
| pdf
0712116
From: Wells, A study of the formants of the pure vowels of British English -- Submitted in 1962, in partial fulfilment of the requirements for the degree of M.A., University of London.
http://www.phon.ucl.ac.uk/home/wells/formants/relamp-uni.htm 080101
"As, it seems, with most acoustic vowel parameters, it is possible to manipulate the figures in such a way as to show correlation with tongue height and yield an acoustic triangle similar to the familiar auditory-articulatory triangle -- in this case by plotting as the ordinate the difference in amplitude between F1 and F2, with the abscissa arbitrarily arranged to bring out the similarity (fig. 7). In other words, high tongue position corresponds to an F2 of much less intensity than F1, while low tongue position corresponds to an F2 of intensity similar to that of F1. As far as /ʌ/ is concerned, this amplitude triangle gives a better positioning (that is, a positioning more like the auditory-articulatory positioning) than any other acoustic plot, not excluding the frequency plot of F1 versus F2. "
UKT: Note the Wells' usage "auditory-articulatory triangle". I presume he meant the familiar vowel quadrilateral of Daniel Jones.
Go back brit-vow-b
From: Wikipedia, http://en.wikipedia.org/wiki/Tenseness download 070906
UKT: German linguistics call the distinction fortis and lenis (online: fortis and lenis) rather than tense and lax.
For Romabama, the terms checked (followed by killed consonants) and free vowels are preferable.
Tenseness: In phonology, tenseness is a particular vowel or consonant quality that is phonemically contrastive in many languages, including English. It has also occasionally been used to describe contrasts in consonants. Unlike most distinctive features, the feature [tense] can be interpreted only relatively, that is, in a language like English that contrasts [ i ] (e.g. <beat> ) and [ ɪ ] (e.g. <bit>), the former can be described as a tense vowel while the latter is a lax vowel. Another example is Vietnamese, where the letters ă and â represent lax vowels, and the letters a and ơ the corresponding tense vowels. Some languages like Spanish are often considered as having only tense vowels, but since the quality of tenseness is not a phonemic feature in this language, it cannot be applied to describe its vowels in any meaningful way.
Comparison between tense and lax vowels: In general,
tense vowels are more close (and correspondingly
have lower first formants) than their lax counterparts.
Tense vowels are sometimes claimed to be articulated with a more
advanced tongue root than lax vowels, but this varies,
and in some languages it is the lax vowels that are
more advanced, or a single language may be inconsistent
between front and back or high and mid vowels (Ladefoged and
Maddieson 1996, 302–4). The traditional definition,
that tense vowels are produced with more "muscular tension"
than lax vowels, has not been confirmed by phonetic experiments.
Another hypothesis is that lax vowels are more centralized
than tense vowels. There are also linguists who believe that
there is no phonetic correlation to the tense-lax opposition.
In many Germanic languages, such as RP English,
standard German, and Dutch, tense vowels are longer
in duration than lax vowels; but in other languages, such as Scots,
Scottish English, and Icelandic, there is no such correlation.
Since in Germanic languages, lax vowels
generally only occur in closed
syllables, they are also called
checked vowels, whereas the tense vowels are called
free vowels as they can occur at the end of a syllable.
Tenseness in consonants: Occasionally, tenseness
has been used to distinguish pairs of contrasting consonants
in languages. Korean, for example, has a three-way contrast
among stops; the three series are often transcribed as
[p t k] - [pʰ tʰ kʰ] - [pʼ tʼ kʼ].
The contrast between the [p] series and the [pʼ] series
is sometimes said to be a function of tenseness:
the former are lax and the latter tense. In this case
the definition of "tense" would have to include
greater glottal tension.
In some dialects of Irish and Scottish Gaelic,
contrasts are found
between [l, lj, n, nj]
on the one hand and [ɫˑ, ʎˑ, nˠˑ,
ɲˑ] on the other hand. Here again the former set have sometimes been
described as lax and the latter set as tense. It is not clear what phonetic
characteristics other than greater duration would be associated with tenseness
in this case.
Some researchers have argued that the contrast in German
traditionally described as
voicing
([p t k] vs. [b d g]) is in fact better analyzed as tenseness,
since the latter set is voiceless in Southern German.
German linguistics call the distinction
fortis and lenis rather than tense and lax.
Tenseness is especially used to explain
stop consonants of the
Alemannic German dialects because they have two series of them that are
identically voiceless and unaspirated. However, it is debated whether the
distinction is really a result of different muscular tension, and not of
gemination.
Go back lax-tense-vow-b
Excerpt from: Physics Classroom Tutorial, www.glenbrook.k12.il.us/gbssci/phys/Class/sound/u11l2a.html 080326
The sensation of a frequencies is commonly referred to as the pitch of a sound. A high pitch sound corresponds to a high frequency sound wave and a low pitch sound corresponds to a low frequency sound wave. Amazingly, many people, especially those who have been musically trained, are capable of detecting a difference in frequency between two separate sounds which is as little as 2 Hz. When two sounds with a frequency difference of greater than 7 Hz are played simultaneously, most people are capable of detecting the presence of a complex wave pattern resulting from the interference and superposition of the two sound waves. Certain sound waves when played (and heard) simultaneously will produce a particularly pleasant sensation when heard, are are said to be consonant. [{meaning different from that used in Linguistics}}. Such sound waves form the basis of intervals in music. For example, any two sounds whose frequencies make a 2:1 ratio are said to be separated by an octave and result in a particularly pleasing sensation when heard. That is, two sound waves sound good when played together if one sound has twice the frequency of the other. Similarly two sounds with a frequency ratio of 5:4 are said to be separated by an interval of a third; such sound waves also sound good when played together.
Go back pitch-frequency-note-b
From:
• Wikipedia http://en.wikipedia.org/wiki/Source-filter_model_of_speech_production 071224
• Robert M. Krauss, http://www.columbia.edu/itc/psychology/rmk/T2/sf_theory.html 071224From Wikipedia
The source-filter model of speech production models speech as a combination of a sound source, such as the vocal cords, and a filter, the vocal tract (and radiation characteristic).
While only an approximation, the model is widely used in a number of applications because of its relative simplicity. To varying degrees, different phonemes can be distinguished by the properties of their source(s) and their spectral shape. Voiced sounds (e.g., vowels) have (at least) a source due to (mostly) periodic glottal excitation, which can be approximated by an impulse train in the time domain and by harmonics in the frequency domain, and a filter that depends on, e.g., tongue position and lip protrusion. On the other hand, fricatives have (at least) a source due to turbulent noise produced at a constriction in the oral cavity (e.g., the sounds represented by orthographically by "s" and "f"). So called voiced fricatives (such as "z" and "v") have two sources - one at the glottis and one at the supra-glottal constriction.
The source-filter model is used in both speech synthesis and speech analysis, and is related to linear prediction. The development of the model is due, in large part, to the early work of Gunnar Fant, although others, notably Ken Stevens, have also contributed substantially to the models underlying acoustic analysis of speech and speech synthesis.
From: R.M. Krauss
As the figure below illustrates, the vibrations of the vocal folds are the
source of speech. The buzzing produced these vibrations is passed through
the vocal tract, which serves as a resonant filter, damping certain frequencies
and intensifying others. The result is the characteristic sound we identify as
speech. To hear what the buzzing of the vocal folds sounds like before it enters
the vocal tract, click the icon labeled excitation below or
<)).
To hear the filtering action of the vocal tract, click on vocal tract filter
<)).
To hear the resultant speech, click on speech
<)).
You can also click on the following links to online source:
•
<)) excitation
•
<)) vocal tract filter
•
<)) speech
First proposed by Johannes Müeller in the 19th century,
source-filter theory accounts
for the acoustic properties of what are called "voiced"
speech sounds (sounds during whose articulation the vocal chords vibrate).
For "unvoiced" sounds (e.g., Shh),
the source is air forced through a constriction in the vocal tract.
The sentence you hear when you click on the speech icon
[{<))
Why were you away a year ago, Roy?
(this is what I heard -- UKT}] is composed almost entirely of
voiced speech sounds. This is shown in the speech spectrogram
at the bottom of the page [{now on left}],
which plots the distribution of acoustic energy by frequency
over time -- the darker the region, the greater the intensity
of the acoustic energy in that region. Notice that the bands of
acoustic energy are nearly continuous, especially in the lower frequencies.
This is unusual in speech and results from the fact that the utterance
contains voiced sounds almost exclusively. The one discontinuity,
about two-thirds of the way through the utterance, reflects articulation of
the g in <ago>, where the passage of air
is momentarily interrupted and the released in a burst.
The dark bands in the spectrogram are called formants, and reflect the acoustic energies that remain after the filtering action of the vocal tract. The three figures below (taken from Miller) illustrate how different configurations of the vocal tract selective pass certain frequencies and not others. The first shows the configuration of the vocal tract while articulating the phoneme [i] as in the word "beet," the second the phoneme [a], as in <father> [{the <a> in <father> is pronounced British English as /ɑ/ -- DJPD16-199}], and the third [u] as in <boot>. Note how each configuration uniquely affects the acoustic spectrum -- i.e., the frequencies that are passed.
UKT: The minimum three, [a, i, u], to characterize a language (e.g., American English) are given above. For the British, [æ, i, u] have to be given.
Go back source-filter-th-note-b
End of TIL file.