Update: 2007-12-11 01:00 PM -0500

TIL

The Phonetic Description of Voice quality

vq-laver-voice.htm

John Laver, Reader in the Department of Linguistics, University of Edinburgh. Cambridge Univ. Press 1980. First published 1980.
http://www.ling.mq.edu.au/ling/units/sph302/papers/laver_1980_nasal.pdf download 071029
http://www.ling.mq.edu.au/ling/units/sph302/papers/laver_1980_phonation.pdf   download 071024

Downloaded, edited, set in HTML by U Kyaw Tun, M.S. (I.P.S.T., U.S.A.). Not for sale. Prepared for students of TIL Computing and Language Center, Yangon, MYANMAR.

  RBM4M |Top

Contents of this page

UKT:
• This chapter should be read with EGG and Voice quality: http://www.ims.uni-stuttgart.de/phonetik/EGG/page6.htm#1 download 071008, which is more up to date. Unfortunately, I have no time to download it and incorporate into my notes. I do hope it will still be online when I could find time. -- UKT071027
• Be prepared to read lengthy complex sentences in this file which I always expect from a British writer. Also, expect to find mistakes made by the unknown typist who must have copied the text out of the original printed book. I have put in paragraph breaks (¶UKT) to make reading easier. The reader is recommended to consult the original book -- unfortunately, I could not get the book myself.
• To be able to cite the contents of this paper in research papers, I have included page numbers of the original book within [{...}]. Most of the pix have been redrawn to meet the TIL requirements.

03. Phonatory settings (pdf p093-240)
03.01. Modes of laryngeal vibration in phonatory settings
03.01.01. Modal voice
03.01.02. Falsetto -- paralinguistic function: seriousness
03.01.03. Whisper -- paralinguistic function: confidentiality
03.01.04. Creak -- vocal fry or glottal fry,
   syn: Laryngealization for identification of "creaky voice" in Burmese
   similar: Glottalization for {a.that} of {wag}-aksharas
03.01.05. Harshness
   Ventricular voice syn: extremely harsh voice
03.01.06. Breathiness
03.02. Compound phonation types

UKT note
stridor

Contents of this page
p093

03. Phonatory settings

The domain of phonatory settings is limited by the same criterion that was applied to supralaryngeal settings: only those settings which can potentially be controlled by any speaker with a normal vocal apparatus will be admitted into the descriptive phonetic scheme.

Only a small number of basic types of phonation will be distinguished; although we shall see that these basic types can combine with each other in various ways to make up a larger number of composite phonatory qualities.

The term 'register' has been used very frequently, particularly in the literature on singing, to refer to particular modes of vibration of the vocal folds. The term will not be adopted here, however, nor is it proposed in this outline of major types of phonatory settings to go into the details of the different minor 'registers' such as 'head register' and 'chest register' that are claimed to exist. A thoroughly comprehensive account would have to include them, but the difficulty of giving such an account is that almost as many 'registers' are suggested as there are writers on the subject. 'Register' is also use, very often, to refer to 'voice pitch level', instead of 'mode of vocal fold vibration', and it is not always clear which meaning is involved. Möner, Fransson and Fant (1963:18), in discussing the general area of 'registers', say that it 'suffers from an abundance of terms and an ambiguity of their use', and they list 107 different labels.

UKT: Remember that Pitch = frequency of sound, e.g. middle C = 261.6 Hz
-- Hypergrammar http://hyperphysics.phy-astr.gsu.edu/hbase/hframe.html download 071024

A different ambiguity underlies many examples of the use of the term 'register' by linguists. Introduced as a technical term in phonetics by Henderson (1951), its reference was initially clearly to laryngeal activity. Henderson writes, for example, that 'An important feature of Cambodian phonology is the presence of two contrasting voice registers, each with its appropriate vowel alternance' (Henderson 1950. She is referring to 'chest register' and 'head register', and distinguishes such laryngeal considerations of 'register' from associated suprlaryngeal factors such as [{p093end}] vowel quality. 'Register' has since tended to become a phonological concept rather than primarily a phonetic one and as such to act as cover term not only for laryngeal activity but also for the associated supralaryngeal factors. For example, Shorto writes about Mon, a language of South East Asia [UKT: {mwan}], that

paratonal register distinction is broadly similar to that described for Cambodian by Henderson. Its exponents are distributed throughout the articulatory complex but exclude pitch feature. Chest register ... is characterized by breathy voice quality in association with a general laxness of the speech organs and a relatively centralized articulation of vowels. The more frequent head register ... is characterized by a clear voice quality, relative tenseness, and peripheral vowel articulation. (Shorto 1966: 399-400)

In other words, as a linguistic concept, 'register' often now refers not solely to laryngeal behavior, but rather to a constellation of activities at various levels of the vocal tract.

Phoneticians on the whole have been much more interested in supralaryngeal activities than in phonatory ones, and consequently more is known about the former than the latter. This is not without justification, when one thinks of the central importance phoneticians properly place on the capacity of speech to embody the distinctive patterns of language. The number of linguistic distinctions manifested by supralaryngeal activities is always much greater than those signalled by different types of phonation, in every known language of the world. Ladefoged (1971: 16-18) suggests nine different modes of vibrations of the vocal folds (which he says does not exhaust the total number found in different languages); and he points out that, taking the feature that he calls 'glottal stricture', no single language makes use of more than three out of the nine modes of vibrations for manifesting linguistic oppositions. In a later book, Ladegoged amends this last statement by listing five different values, 'all of which are used in Beja, a language spoken in the Suden' (Ladefoged 1975: 261).

UKT:
• By "nine different modes of vibrations of the vocal folds" does Ladegoged mean the "Glottis positions" shown on the right, where only six are shown? I am waiting for input from my peers.

 

In attempting to characterize the different phonatory settings, the first requirement is to define the neutral mode of phonation against which other modes can be contrastive described. The neutral mode of phonation is one where the vibration of the true vocal fold is periodic, efficient, and without audible friction. This description is worded to facilitate the discussion of the other types of phonatory settings where one (or more) of the specified characteristics does not apply. There are thus settings where the phonation is not achieved by the true vocal folds alone (as in ventricular voice), or where the mode of vibration is aperiodic (as harsh voice), inefficient and with slight audible friction (as in breathy voice), or with strongly audible friction (as in whispery voice and whisper). This still leaves some types of phonation without contrastive identification against the neutral mode of vibration -- for instance, falsetto, creak, and creaky voice. This is partially put right by noting that the term adopted here for the neutral mode of vibration, modal voice, is Hollien's (1971), and that he says he chose the term 'modal' because 'it includes the range of fundamental frequencies that are normally used in speaking and singing' (Hollien 1971: 320). This comment can be incorporated into the description of the neutral type of phonation, but it is important to bear in mind that matters of pitch as such are not strictly relevant to quality. Falsetto is then characterized as having a range of fundamental frequency potentially higher than that of modal voice, and creak as having a range potentially lower. (The reason for the cautious use here of potentially' is an awareness that the pitch range of falsetto can substantially overlap the range of modal voice (Broad 1973: 153), in terms of what is physically possible, even though male speakers using falsetto tend to utilize only the middle to high part of the possible range, in Western culture at least; and the inverse applies to creak, which can be made on a high enough pitch to overlap the bottom end of the modal voice range.) The characterization of creaky voice is a problem which will be discussed later in this chapter.

None of these descriptions of phonation types is adequate however. The differences between modal voice and each other settings mentioned are of course more numerous and more complex than this brief sketch suggests. Even in the case of modal voice, it is only a description, and remains far from a definition: the issue is merely hedged by saying that modal voice is the type of vocal fold vibration which phonetic theory assumes takes place in ordinary voicing, when no specific feature is explicitly changed or added. It is very hard to construct a satisfactory, short definition of modal voice; instead, its characteristics will be elaborated in a summary outline of the aerodynamic and physiological factors thought to be involved in producing this neutral mode of phonation. This base will then be used for the further discussion of the different phonation types.

The very widely accepted theory of vocal fold vibration is the aerodynamic-myoelastic theory. The myoelastic component was first stated by Müller (1837). He experimented with an excised human larynx set in a frame, exerting different tensions on the various muscles while air was blown through the glottis. He established that the vocal folds vibrate when they are adducted (i.e. brought together) to interrupt such an airstream, and that increasing longitudinal tension in the vocal folds is correlated with rising pitch.

The aerodynamic component is a relatively recent addition. Van den Berg (1958b) has shown that tension in the vocal folds is not the only factor to be considered in glottal vibration, in that the characteristic behaviour of airflow through narrow constrictions plays a very important part, particularly in closing the glottis. If we consider the train of events in one cycle of laryngeal vibration, the situation is basically as follows: with the glottis close by muscular tensions acting on it in a number of dimensions (discussed later), sub-glottal air pressure builds up with pulmonic effort to expel the air from the lungs. It very rapidly reaches the pressure necessary to blow the vocal folds apart. They separate with a vertical phase difference, such that the lower parts of the vocal folds separate before the upper parts (Broad 1977); Farnsworth 1940; Flanagan and Landgrap 1968; Hirano 1977; Hollien, Coleman and More 1968; Ishizaha and Flanagan 1972; Matsushita 1969, 1975; Titze 1973, 1974 Titze and Strong 1975; Timcke, von Leden and Moore 1958). A jet of air shoots into the pharynx, momentarily relieving some of the sub-glottal over-pressure. The glottal margins of the vocal folds which a moment ago were in contact, now form a narrow constriction whose aerodynamic effect on the upward flow of air is to act as a venturi tube. This makes the jet of air from the lungs accelerate through the narrow gap, with the jet reaching a speed of between 2000 and 5000 cm/s (Catford 1977: 98). The Bernoulli effect on the air molecules in this accelerating jet causes a very local drop in air pressure in the constricted passage between the vocal folds and the vocal folds are 'sucked' inwards towards each other by the drop in pressure. The sub-glottal air pressure, momentarily released into the pharynx, has now dropped sufficiently to be overcome by the two forces combining to close the glottis -- the myoelastic tensions acting on and in the vocal folds, and the aerodynamic Bernoulli effect. The moment of glottal closure is usually the point in the phonational cycle at which the acoustic excitation of the supralaryngeal tract is most powerful. With the glottis closed, sub-glottal pressure builds up again, and the cycle of events repeats itself, between approximately 50 and 250 times a second in voiced segments uttered by average male speakers in ordinary conversation.

This picture of the combined aerodynamic and muscular forces at work in laryngeal vibration was built up by the work of a number of researchers in the 1950s, including Smith (1954a, 1954b, 1956a, 1956b, 1957), and Faaborg-Andersen (1957). The major contribution came from van den Berg himself (1954a, 1956, 1957a, 1957b, 1958a, 1958b) and with colleagues (van den Berg, Zantema and Doornenbal 1959). Van den Berg gives his own summary of the forces which interact to produce laryngeal vibration

[The aerodynamic-myoelastic] theory postulates that the function of the larynx is based on the interplay of three factors:
(1) the aerodynamic properties of the air which actuates the larynx,
(2) the adjustment of the larynx, brought about by the proper nervous activation of the various muscles, and the myoelastic properties of the laryngeal components,
(3) the aerodynamic coupling between (a) the subglottal system and the larynx, (b) the left and right vocal fold, and (c) the larynx and the supraglottal system. (van den Berg 1968: 291-2)

This comment on the different aerodynamic coupling factors is important in considering the fine details of the auditory quality of the voice. It implies that the detailed mode of vibration of the vocal folds will depend partly on the degree of effort exerted sub-glottally by the pulmonic system, so that fine details of phonatory quality will necessarily co-vary with laryngeal intensity. Also, coupling factors between the left and right vocal folds become auditorily important when organic asymmetries of the larynx are considered, in such cases as nodules and polyps on one or other vocal folds. In addition, the articulatory state of the supralaryngeal vocal tract will influence the fine detail of the vibratory pattern of the vocal folds to some small degree. It was noted earlier, for example, in the section on velopharyngeal settings, that nasal voice has been observed to be associated with a particular mode of vibration of the vocal folds.

Although the acerodynamic-myoelastic theory of laryngeal vibration is very well established now, it is not the only candidate in the area. There are two others, the neurochronaxic theory of Raoul Husson and his fellow-workers, and the muco-undulatory theory.

Hosson's theory is diametrically opposed to the assumptions of the aerodynamic-myoelastic theory. Essentially, it dismisses muscle tension and the Bernoulli effect as the prime mechanism of phonation, and maintains that each cycle of vibration is the direct muscular response of the vocalis muscles (forming the vocal folds) to an individual neural command (Husson 1950a, 1950b, 1951, 1952, 1957, 1962, 1964; Laget 1953; Moulonguet 1954; Piquet and Decroix 1956; Piquet, Decroix, Libersa and Dujardin 1957; Portmann and Robin 1956). This theory has been widely rejected (Robin 1960a, 1960b; Lafon and Cornut 1960; von Leden 1961; Weiss 1959). One aspect of the theory hinges on whether or not fibres of vocalis muscle, which makes up the glottal part of each vocal fold, run at an angle to the vocal ligament running along the edge of each fold. Goerttler (1950) suggested that vocalis fibres were arranged in two groups, both inserting on the vocal ligament at a sufficient angle to allow their contraction to open the glottis. Wustrow (1952) disputed Goerttler's views on the course of the vocalis fibres, and maintained that they run mainly parallel to the edge of the glottis, and hence cannot act independently to open the glottis. Most anatomists agree broadly with Wustrow, and conclude that the possibility of vocal fold vibration being attributable directly to periodic contraction of the vocalis muscle is remote.

The muco-undulatory theory put forward by many researchers (Baer 1973; Broad 1977; Farnsworth 1940; Hiroto 1966; Matusushita 1969, 1975; Perello 1962a, 1962b; Smith 1956, 1961) is of smaller scale, and is complementary to the aerodynamic-myoelastic theory, not antagonistic. It concerns the effect on phonation of wave motion in the wet mucosal layer that covers the surfaces of the vocal folds and which is attached to the underlying muscle fibres by connective tissue. Tize and Strong (1975: 740) suggest that this undulation

involves the relative motions between the mucosa and the ligament vocalis, and occurs whenever the vocal cord is unstretched. A surface is seen to propagate laterally from the glottis toward the vocal cord boundary ... Due to the high surface tension of the mucosa (which is, of course, the tension of the mucous membrane), the surface wave is readily dispersed, but occasionally gets reflected from the boundary and travels back toward the glottis.

Wave motion of this sort has been observed in many high-speed and stroboscopic cinefilms of the vocal folds in phonation (e.g. Farnsworth 1940; van den Berg, Vennard, Berver & Shervanian 1960). Broad (1977) writes that

It is this wave motion of the mucosa which prompted Hirano (1975) to suggest that the vocal fold should be considered in a mechanical sense to be subdivided into a loose layer (the epithelium and superficial layer of the lamina propria) and a stiff layer (the deeper layer of the lamina propria and the vocalis muscle). Earlier, Hiroto (1966) had incorporated the wave motions of the loose, superficial layer of the vocal fold as an essential component of the his muco-visco-elastic-aerodynamic explication of the theory of vocal fold vibration. (Broad 1977: 255-6)

A picture is thus presented of the vocal fold as a layered structure, with each deeper layer being stiffer than its superficial covering (Hirano 1977: 21). During phonation, the cross-sectional shape of each vocal fold is subject to continuously changing deformation. Part of the dynamic deformation is attributable to the mucosal wave motion travelling up the external surfaces of the vocal folds and into the ventricles, and part to the more gross displacements involved in the vertical phase difference mentioned earlier. It is apparent that the detail of vocal fold vibration is highly complex.

Smith (1961) considers that the contribution of these mucosal phenomena is to the fine detail of the quality of phonation, and not to the fundamental frequency. Perello (1962a, 1962b) believes the reverse. He argues that changes of fundamental frequency in the absence of muscular adjustments can occur when the consistency of the mucosal layer changes, as in laryngitis, or in prementrual mucosal changes in women. Changes in the mucosal lining in this way constitute organic conditions of the vocal apparatus, and lie outside the scope of phonetic control.

It will be assumed from this point onwards that the vibrations of the vocal folds in modal voice are brought about in the way described by the aerodynamic-myoelastic and muco-undulatory theories.

Contents of this page

03.01. Modes of laryngeal vibration in phonatory settings

We come now to a summary outline of the physiology of laryngeal vibrations in modal voice, with which that of the other phonation types may then be compared.

Three laryngeal cartilages form the basic frame within which the muscular control of phonation is exercised. These are the thyroid, the crioid and the paired arytenoid cartilages. Figure 13 is a schematic diagram of the relative position of these cartilages.

The thyroid is the big, shielding cartilage protecting the front and sides of the larynx from injury. It forms the 'Adam's apple' in male speakers, with its characteristic protruding, slightly pointed shape, and a V-shaped notch in the top edge at the centre. It is made up of two quadrilateral cartilaginous wings, or plates, vertically fused at the front under the central notch. The muscles which make up the true vocal folds and the ventricular folds are attached to the front, internal surface of the thyroid, at the point where the lateral plates fuse together.

The cricoid lies immediately below the thyroid, and is the uppermost of the tracheal cartilages. It is different from the other tracheal rings, in that they are incomplete rings with a gap at the back, where the trachea shares a common wall with the esophagus, whereas the cricoid is a complete ring. It is often compared to a signet ring, as the etymology of the name suggests, because the part at the back of the cricoid ring is considerably larger than that at the front. The vertical dimensions of the 'signet' at the back is usually about 25 mm in the adult male, and about 8 mm at the front. Together with the thyroid, it forms an effective external protective structure for the rest of the larynx; it has also been said to be 'the foundation of the structures which, as a functional group, are called the larynx' (Heffner 1950: 15).

UKT:
cricoid n. 1. A ring-shaped cartilage of the lower larynx that articulates with the thyroid cartilage and artenoid cartilages. [New Latin cricoīd ēs from Greek krikoeidēs ring-shaped krikos ring; See sker-2 in Indo-European Roots. -oeid ēs -oid] -- AHTD

The arytenoid cartilages, much smaller than the thyroid and the cricoid, sit on the upper rim of the 'signet' of the cricoid at the back. They can rotate vertically and horizontally to a certain extent, as well as slide from side to side on the cricoid (Saunders 1964: 72). They are shaped somewhat like small pyramids on a triangular base (Cates and Basmajian 1955). The posterior ends of the true vocal folds are attached to the lower, forward angles of the arytenoids, called the vocal processes. The posterior ends of the ventricular folds are attached to the apex of each arytenoid.

The thyroid is connected to the hyoid bone above it by the thyrohyoid muscle and ligament. When the larynx rises, pulled up by the action of the thyrohyoid muscle, or of the stylpharyngeus muscle (discussed earlier in this chapter, and which runs from the skull nearly vertically downwards at each side of the pharynx to insert in the back edge of the thyroid), the thyroid 'slips up under cover of the hyoid' (Kaplan 1960: 115).

The thyroid is also connected not only to the arytenoid cartilages, as stated immediately above, but also to the cricoid, the the paired cricothyroid muscle. This runs upwards, backwards and laterally from the outer surfaces of the forward part of the cricoid to insert in the lower edge of the thyroid. The effect of its contraction is to pull the front of cricoid ring upwards towards the thyroid, which has the mechanical consequences of rotating the back of the cricoid, with its attached arytenoids, downwards and backwards from its neutral position. This lengthens and tenses the vocal folds, thus contributing to pitch control in phonation (Sawashima 1974), and to the small changes in phonatory quality arising from changes in the fine detail of the cross-section of the folds. This retraction of the cricoid also tends to bring the vocal folds slightly closer to each other (Saunders 1964: 73).

The laryngeal muscles of interest here fall into two groups: firstly, those which, like the cricothyroid, can change the positions of the cricoid relative to the thyroid; and secondly, those which affect chiefly the positions of the arytenoids relative to the cricoid.

Figure 14 is a schematic diagram of the location and actions of the muscles of the first category. A less schematic depiction of these muscles can be seen in Figure 14a.

The muscles which can change the position of the cricoid relative to the thyroid are the cricothyroid muscles, and the paired thyroarytenoid muscles, which make up the true vocal folds and the ventricular folds. The thyroarytenoids, running from a fixed attachment at the fused angle of the thyroid to the mobile arytenoids, as noted above, are described by Heffner as follows:

Each [side] is divided into two parts -- an upper and a lower -- by a recess, or ventricle, which undercuts the upper portion throughout most of its length. The lower portion of each thyroarytenoid muscle is attached to the vocal process of the arytenoid cartilage. The upper portion is attached to the body and the upper tip of the arytenoid. Indeed, some of the upper fibers of the upper portion of the muscle run on upward into the folds which join the arytenoids with the edges of the epiglottis [i.e. the aryepiglottic folds]. When contracted, the thyroarytenoid muscles tend to draw the arytenoids forward, at the same time tilting them towards the thyroid cartilage. ... The upper portions of this pair of muscles, with their covering mucous tissue, are known as the ventricular folds. ... The lower portions ... have a name of their own, the vocalis muscles [and they] constitute the vocal bands. ... In cross-section the vocal bands are triangular, being shaped much like the cushions of a billiard table, and only their median edges are free. (Heffner 1950: 17-18)

The ventricle mentioned is the ventricle of Morgagni, and it should be pointed out that the ventricular folds have a rather different composition of tissue than the true folds: Kaplan describes the ventricular folds as thick rounded folds of mucous membrane developed around the ventricular ligaments. They are soft and somewhat flaccid. Each contains ... a few muscle fibers and numerous mucous glands (Kaplan 1960: 124-5). Saunders (1964: 73) confirms that they contain only a few muscle fibres. Their vibration will therefore tend to be inefficient, with hypertension needed to adduct them sufficiently to phonate.

Longitudinal tension of the true and ventricular vocal folds can thus be achieved by two different actions. The first action is that of retraction and slight vertical rotation of the the cricoid by means of the cricothyroid muscle, which puts longitudinal tension on the vocal folds by stretching them. The second action is the contraction of the muscles which make up the vocal folds themselves, the thyroarytenoids.

The second category of muscles, those which control the positions of the arytenoid cartilage relative to the cricoid, have the function of opening and closing the glottis. Biologically vital in helping to control the airway to and from the lungs, they are small but powerful muscles, and are capable in combination of setting the glottis in a wide variety of adjustments. Figure 15 is a schematic diagram of these muscles and their action.

Only one muscle is normally used to open the glottis, the paired posterior cricoarytenoids [{ #5 fig.15}] (Kaplan 1960: 150). These arise from the back, outer surface of the cricoid and run upwards to the side to join the arytenoids on their rearmost angles, called the muscular processes. Their contraction pulls the muscular processes in an arc towards the back, and the effect is to rotate the arytenoids, pivoting the other ends of the arytenoids, the vocal processes to which the vocal folds are attached, outwards. Haffner (1950: 20) says that this happens 'at every normal inhalation'. ¶UKT 

In speech, their action is the major contributor to opening the glottis for voicelessness (Hardcastle 1976: 76; Hirose and Gay 1972:158). Heffner continues: 'The action of these muscles is opposed to and can thus be controlled by the direct pull of the lateral cricoarytenoids and also by the direct pull of the thyroarytenoid muscles. The unopposed pull of the posterior cricoarytenoids widens the opening between the vocal bands to its maximum' (Heffner ibid.).

The muscular action which closes the glottis is more complex. The lateral cricoarytenoids [{#7 fig.15}] run backwards fro the outer and upper surface of the cricoid on both sides, and like the posterior cricoarytenoids, are attached to the muscular processes of the arytenoids. In contraction, as indicated above, the lateral cricoarytenoids directly opposes the action of their posterior counterparts, and swivel the arytenoid cartilages forward and inward (Hardcastle 1976: 78; Kaplan 1960: 152), 'toe-ing' the vocal processes inwards. This action brings the vocal folds together, and closes the glottis along its length from the vocal processes of the arytenoids to the thyroid cartilage.

There are two other muscular actions which help to close the glottis along its full length. The first is that of the arytenoid muscle complex. The arytenoid muscle (sometimes referred to as the interarytenoid muscle) is made up of two sets of fibres: one of these sets is the transverse arytenoid muscle [{#4a fig15}], which is an (unpaird) thick, rectangular mass covering the entire deep posterior surface of both arytenoids. It may be considered to originate along the muscular process and lateral border of one arytenoid and to cross over to reach the lateral edge of the other arytenoid. It draws the arytenoids medially by a gliding action which adducts the vocal folds. (Kaplan 1960: 151)

It opposes the action of the lateral cricoarytenoids. The other part of the arytenoid muscle complex is the oblique arytenoid muscle [UKT: #4b fig15]. It is a paired muscle in the form of the letter X, and it lies behind the transverse muscle, on its outer surface. Each branch of the oblique muscle starts low down on the backmost surface of the arytenoid and rises crossing to the highest angle of the other arytenoid. Its contraction tilts the tops of the arytenoids towards each other, and in conjunction with the transverse part of the arytenoid muscle, helps to adduct the vocal folds (Hardcastle 1976: 79; Heffner 1950:20; Kaplan 1960 151; van den Berg 1968: 294).

The second muscular action which can help to close the glottis is the contraction of the muscles forming the vocal folds themselves, the thyroarytenoids. The thyroarytenoids makes a double contribution. The vocalis muscles (the part of the thyroarytenoides which make up the medial body of the folds nearest to the edges of the glottis) contract to exert longitudinal tension in the vocal folds, which reduces the length of the glottis (van den Berg 1968: 294). Contraction of the outer, lateral parts of the thyroarytenoid muscles helps the lateral cricoarytenoids to bring the vocal processes of the arytenoids together.

To summarize the muscular actions which lead to glottal closure, then, van den Berg writes

A contraction of the (powerful) interarytenoid muscles primarily adducts the apexs of the arytenoids and closes the back part of them so that no wild air can escape ... A contraction of the lateral cricoarytenoid muscles adducts the apexes of the arytenoids and closes the back part of them so that no wild air can escape ... A contraction of the latereal cricoarytenoid muscles adducts the vocal processes of the arytenoids and therefore the body of the vocal folds. This adduction is agumented by a contraction of the lateral parts of the thyroarytenoid muscles (this contraction goes along with an adduction of the vocal folds). These adductional forces provide a medial compression of the vocal folds and reduce the length of the glottis which is effectively free to vibrate. (van den Berg 1968:294)

It is useful to set up a simple map for discussing the different locations in the glottis which are relevant to the characteristics of the different types of phonation. A useful point of reference for this is the vocal ligament, which runs along the glottal edge of each vocal fold at the point where it normally makes contact with the other. We can then follow Catford (1964: 32) in using the term 'glottal',  without any further qualification, to mean the whole length of the opening between the true vocal folds, from the front angle of the thyroid cartilage to the back of the arytenoids ; and we can distinguish the ligamental glottis, which is part of the full glottis formed by the vocalis muscles, with the length of the vocal ligaments along each edge, as opposed to the cartilaginous glottis, in the stretch where the arytenoid cartilages are located. The dimensions of these sections are of interest: Morris (1953) says that the 'intermembranous' (ligamental) part of the glottis is normally about 15.5 mm in the male, and 11.5 mm in the female. The  length of the cartilaginous glottis is about 7.5  mm in the male and 5.5 in the female. This makes the full glottal length in males about 23 mm, and about 17 mm in females. Kaplan (1960: 128) adds that 'The widest part of the glottis is 6 to 8 mm in the male and this can increase to about 12 mm, according to condition.'

The nature of the glottis tempts one to regard it as a two-dimensional space; and for many phonetic purposes, that is sufficient. But in the case of contributions to the fine detail of phonatory qualtiy, the need to take account of the changing three-dimensional configuration of the space between the vocal folds is very similar to the situation at the other end of the vocal tract, at the lips. The changing vertical thickness of the vocal folds from the outer wall inwards to the vocal ligaments at the edge of the glottal space reflects the interplay of the different tensions that are exerted in and on the folds by the laryngeal musculature, and this third, vertical dimension is one factor among others which differentiates the major settings of the phonatory mechanism.

We are now in a position to isolate some parameters of laryngeal control which are relevant to our discussion of different phonatory settings. From the account of laryngeal physiology offered above, three parameters of muscular tension emerge which have to interact with aerodynamic factors of pulmonic airflow and pressure. They are adductive tension, medial compression and longitudinal tension. The second of these, medial compression, is the factor most in need of definition. As described by van den Berg and Tan (1959), van den Berg (0968), Broad (1973) and Hardcastle (1976), medial compression is a composite adductive product of the action of number of muscles, including the interarytenoids, the lateral cricoarytenoids and the lateral parts of the thyroarytenoids. This conception of medial compression overlaps that of adductive tension, and it will convenient to try to distinguish these two effects more sharply. One useful purpose in creating a distinction in this area is to facilitate a differentiation of phonatory settings, and to offer a tentative physiological explanation for some observed incompatibilities between settings which inhibit their co-occurrence. For this specific purpose, adductive tension will be defined here as the tension of the interarytenoid muscles, whose consequence will be to bring the arytenoid cartilages together, closing the cartilaginous glottis and hence also the ligamental glottis. Medial compression will be defined as the compressional pressure on the vocal processes of the arytenoid cartilages achieved by contraction of the lateral cricoarytenoid cartilages achieved by contraction of the lateral cricoarytenoid muscles and reinforced by tension in the lateral parts of the thyroarytenoid muscles. ¶UKT

Medial compression will close the ligamental glottis, but whether the cartilaginous glottis also closes will depend on the analytically separate adductive tension achieved by the interarytenoid muscles. A possible acoustic correlate of medial compression is discussed in Chapter 4. Longitudinal tension, straightforwardly, is achieved by contraction of the vocalis and/or the cricothyroid muscles. The geometric relationship of these three parameters is illustrated schematically in Figure 16. Each different phonatory setting will be seen to have different specifications in terms of these three physiological parameters.

Contents of this page

03.01.01. Modal voice

UKT: J. Laver does not give actual sound samples which you can listen. To hear them go online to EGG and Voice quality, or click on the following links:
  • Modal voice <)) (WAV file,  16 kB)
  • Creaky voice <)) (WAV file,  16 kB)
  • Breathy voice <)) (WAV file,  17 kB)
  • Harsh voice <)) (WAV file,  18 kB)
  • Falsetto <)) (WAV file,  8 kB)
  • Pharyngalized voice  <)) (WAV file,  21 kB)
  • Nasalized voice  <)) (WAV file,  18 kB)
The following is from: EGG and Voice quality:
"The neutral mode of phonation is modal voiced phonation. In the normal case the vibration of the vocal folds is periodic with full closing of glottis, so no audible friction noises are produced when air flows through the glottis. All muscular adjustments are on a moderate level and the frequency of vibration, as well as loudness are in the lower to mid part of the range normally used in conversation. The modal phonation of a male speaker occurs at an average of 120 Hz, while for a female speaker it is approx. 220 Hz. For voiced sounds the glottis is closed or nearly closed, whereas for voiceless sounds it is wide open, actually the distance between the folds amount to only a fraction of a milimeter. The degree of opening and its timing is relative to the articulatory gestures and depends on the phonetic environment of a generated sound. The average flow rate is between 100 and 350 cc/s. " -- http://www.ims.uni-stuttgart.de/phonetik/EGG/page10.htm download 071028.

Hollien (1974) makes the following comments about his choice of the term 'modal' for this phonation type:

The modal register is a term I have used for some years: originally, I favoured the term 'normal' to identify this register. However, as van den Berg pointed out (in a personal communication) the use of the label 'normal' would imply that the other registers were abnormal and, of course, his logic is correct. Accordingly, the modal register is so named because it includes the range of fundamental frequencies that are normally used in speaking and singing (i.e. the mode). It is a rather inclusive term and many individuals -- especially workers in vocal music -- would argue that this entity actually constitutes a set of registers or sub-registers including either two (chest and head) or three (low, mid and high) separate entities. I concede the tradition of such an approach, but ... I have yet to find reasonably convincing evidence that such sub-registers do indeed exist. (Hollien 1974: 126)

It may well be that the type of phonation discussed in this book under 'modal voice' should be differentiated into at least two sub-types corresponding to what are called 'chest voice' and 'head voice'. But it will be assumed in what follows that the type of phonation involved in 'modal voice' essentially corresponds to the 'chest voice' register, to the extent that different workers seem to agree on aspects of its production.

The laryngeal characteristics of modal voice, as the neutral mode of phonation, are reasonably well agreed. Catford says that the full glottis is involved, 'both ligamental and cartilaginous, functioning as a single unit' (Catford 1964: 32). He specifies the following details:

Periodic vibration of the vocal folds under pressure from below ... For normal voice the liminal pressure-drop across the glottis is the order of 3 cm of water. [UKT: presumably measured in terms of the height of water column -- the order of 2 mm of mercury] Rates of flow vary according to types of the voice ('registers'): for chest voice at about 100 cps [UKT: I have no idea what 'cps' means] the liminal rate of flow is about 5 cl/sec [UKT: no idea of cl/sec], maximal about 23 cl/sec. These are mean flow-rates: during the open phase of vocal fold vibration flow rates much in excess of these must occur, and since the glottal area is small the general aerodynamic picture is of a series of high-velocity jets shot into the pharynx. (Catford 1964: 31).

Citing Chiba and Kajiyama (1958), Fant suggests a 'normal air consumption of 140 cm3/sec at a subglottal pressure of  ... 16 cm H2O ... during phonation at medium intensity and F0 = 144 c/s [UKT: F0 is presumably fundamental frequency] ... the corresponding particle velocity is 5200 cm/sec and the mean glottis opening 0.027 cm2, (Fant 1960: 269).

Van den Berg specifies the physiological aspects of modal voice (i.e. chest voice register) as follows

This register is characterized by large amplitudes of the vocal folds at low pitches. This requires small passive longitudinal tensions in the vocal ligaments. The minimal values of the interarytenoid contraction and medial compression are small. The vocal folds are short and thick. An increase of the active longitudinal tension in the vocalis muscle increases the pitch. Contraction of the cricothyroid muscles increases the pitch, but, when the passive longitudinal tension in the vocal ligaments increases beyond a rather small value, the vibrations either cease, at a small medial compression, or transit suddenly to the falsetto type, at a sufficiently large medial compression. (van den Berg 1968: 297)

By 'passive longitudinal tension' van den Berg means the tension exercised on the vocal ligament by the action of the strong cricothyroid muscle; such passive tension in the vocal ligament 'can be increased far beyond the maximal active tension in a contracting muscle, on account of the fact that the forces supplied by the comparatively thick cricothyroid muscles are exerted upon the comparative thin vocal ligaments' (van den Berg, 1968: 295-6).

The production of modal voice is thus carried out with only moderate adductive tension and moderate medial compression, with moderate longitudinal tension when the fundamental frequency is in the lower part of the range used in ordinary conversation. The vibration of the larynx in this condition is regularly periodic, efficient in producing vibrations, and without audible friction brought on by incomplete closure of the glottis. A laryngogram of modal voice is included in Figure 17, and a spectrogram in Figure 18.

During phonation, the opening of the adult male glottis has been described by Fant as having the following dimensions

The glottis slit has an effective length of the order of 12 mm in the chest register and the maximal width is of the order of 2.5 mm at a moderate voice effort. The depth of the passage in the direction of the air stream that comes into contact during the closed phase is of the order of 2-5 mm. (Fant 1960: 266)

The acoustic characteristics of modal voice, as the neutral laryngeal setting, have already been given in Chapter 1.

Having outlined the characteristics of modal voice, we can now move on to the description of other phonatory settings, and their various possibilities of combination.

Each of the major phonatory settings will be discussed individually, but there are two criteria by which they can be grouped into categories, and the classification of the different settings into these categories reveals some useful generalities that might be lost sight of in a simple, sequential listing. The description of the different phonatory settings will be prefaced, therefore, by an outline of the classification.

The two criteria of classification can be expressed as questions. Firstly, 'Can the phonation type occur alone, as a simple type?', and secondly, 'Can the phonation type occur in combination with other phonation types, as a compound type, and if so, with which?'

On this basis, there are three different categories involved. ¶UKT

UKT: The following paragraphs are important for me in describing voicing, and also for my work on Burmese-Myanmar medials and {a.that}. Accordingly, I have reformatted the paragraphs to aid my understanding. I have also numbered the compound voices formed. The reader is advised to refer to the original pdf file.

The first category is made up of modal voice and falsetto.
The qualification for membership of this category is that they can each occur alone, as simple types, and can individually combine with members of other groups, as compound types, but not with each other.

The second category consists of whisper and creak.
These can occur alone, as simple types, and together as a compound type, to give :
1. whispery creak.
They can also occur as compound types with either member of the first group, giving :
1. whispery voice and 2. whispery falsetto; and 3. creaky voice and 4. creaky falsetto ;
and they can occur as compound types with members of the first group and with each other, giving :
1. whispery creaky voice and 2. whispery creaky falsetto.

[UKT: The third category is made up of harshness and breathiness.]
The third category is formed by modificatory settings which can only occur in compound types of phonation, and never by themselves as simple types. These are harshness and breathiness.

UKT: note that Laver does not use the terms harsh voice and breathy voice here. But he does use the term in the next paragraph. The term breathy voice is used by other authors, e.g., http://www.ims.uni-stuttgart.de/phonetik/EGG/page8.htm download 071026. The Stuttgart author cites Gujarati for "modal voice vs breathy voice" (constrast between vowels), and Indo-Aryan languages (contrast between voice stops).

Harshness can combine with modal voice and with falsetto, to produce :
1. harsh voice and 2. harsh falsetto.
Breathiness can only combine modal voice, to give breathy voice;
the reasons for the incompatibility of falsetto and breathiness will be explored later.
Harshness and breathiness cannot combine with the phonation types from the second category, whisper and creak and whispery creak, unless there is member form the first category, either modal voice or falsetto, also present. Because the mutual compound products of this category of settings are thus :
1. harsh whispery voice, 2. harsh whispery falsetto, 3. harsh creaky voice, 4. harsh creaky falsetto, 5. harsh whispery creaky voice, and 6. harsh whispery creaky falsetto.

The omissions of nominal possibilities from the above list, such as breathy falsetto, are to be explained by either redundant or conflicting acoustic requirements, or by conflicting physiological requirements. A tentative physiological explanation will be advanced in terms of mutally exclusive specification of the phonatory settings involved on one or more of the three muscular parameters of longitudinal tension, adductive tension and medial compression, as indicated earlier. It is important to emphasize the tentative nature of this proposed explanation: it is offered here as a physiological hypothesis that emerges from the wide range of laryngeal research cited. The attractiveness of the hypothesis is clear: the auditoryily-observed compatibility and incompatibility of the various different phonatory settings in compound phonation is given a physiological basis. But the hypothesis must be subjected to empirical test by physiological research before it can be elevated to the status of a reliable explanation.

UKT: Remember that in medial formation, the medial formers {ya. ra. la. wa. ha.} all belong to the {a.wag}-consonants. In the case of {ya. wa.}, they are considered to be "semi-vowels" and the medial formation can be explained easily by considering them as vowels.
•  {ka.} + {a.that} + {ya.} --> {kya.}
#  {kya.} + {ka.} + {a.that} --> {kyak} with the vowel sound /æk/
#  {kya.} + {pa.} + {a.that} --> {kyap} with the vowel sound /ʌp/
My task is to explain why the vowel sound has changed from /æ/ to /ʌ/, and how to show this in Romabama. Showing in Romabama means I cannot use /ʌ/ because it is not an ASCII character. If I were to let Romabama to be just a transliteration, I would not have to explain the vowel change, how
   I will also have to explain why {ka.} is not allowed to have a {ha.hto:}.

There is one sub-category that should be mentioned, and it is a subcategory of the harshness setting. This is the setting where the ventricular folds become involved in the phonation of the true vocal folds by squeezing closed the ventricle of Morgani and pressing down on the true vocal folds, with the effect that the true and the ventricular folds combine to vibrate as more massive, composite elements. In order to bring the ventricular folds to this position, a high degree of muscular tension is needed, and the effect is normally to make phonatin auditorily very harsh. Ventricular voice will therefore, be used in the descriptive system, but only as a synonym for 'extremely harsh voice'. The use of the label 'ventricular' is physiologically explicit, but not enough is yet known about phonation with the ventricular folds, in terms of combinatorial possibilities, to give ventricular phonation independent status as a separate phonation type.

Contents of this page

03.01.02. Falsetto

Hollien (1971: 329) suggests that modal voice and falsetto are 'completely different laryngeal operations'. Earlier, modal voice was described as having moderate adductive tension, moderate medial compression, and moderate longitudinal tension. Falsetto is different in all three respects. Van den Berg (1968: 298) states that adductive tension of the interarytenoid muscles is high, medial compression of the glottis is large, and longitudinal passive tension of the vocal ligaments is also high (though there is little active longitudinal tension in the vocalis muscles).

There is a reasonably wide measure of agreement in recent accounts of the laryngeal mechanisms responsible for the production of falsetto. The summary account below of the physiology of falsetto is based on a number of sources: Chiba and Kajiyama 1958; Hollien 1971; Hollien and Colton 1969; Judson and Weaver 1942; Kaplan 1960; Luchsinger and Arnold 1965; Rubin and Hirt 1960; van den Berg 1968; Van Riper and Irwin 1958; Zemlin 1964.

The consensus of these sources describes the production of falsetto as follows. The arytenoid cartilages adduct the vocal folds, by contraction of the interarytenoid and the lateral cricoarytenoid muscles. The vocalis muscles along the glottal edge of each vocal fold remain relaxed, but the mass of each vocal fold is made stiff and immobile by contraction of the thyroarytenoid muscles, which make up the outer bulk of the folds. The vocal ligaments along the glottal edge of the vocal folds are put under strong tension by the contraction of the cricothyroid muscle. This results in the vertical cross-section of the edges of the vocal folds becoming thin. The glottis often remains slightly apart, and the characteristic sub-glottal air pressure is lower than for modal voice (Van Riper and Irwin 1958, 228; Kunze 1964). Van Riper and Irwin suggest here that with the vocalis muscles relaxed, and only the thin margins of the vocal folds participating in phonatory vibration, the expenditure of air is bound to be reduced. They cite Trojan (1952) as showing that oxygen consumption is decreased in falsetto voice, compared with modal voice.

The finding that the glottis often remains slightly open has prompted a number of writers to suggest that falsetto voice is usually accompanied by 'friction noises' (Judson and Weaver 1942: 74) or 'breathiness' (Zemlin 1964: 155). Given that the width of opening is small, this fricative component is much more likely to be of the whispery rather than the breathy sort. This position is reinforced by van den Berg's statement that the forces giving medial compression of the folds (i.e. a compressive tendency at right angles to the front-to-back axis of the glottis) are strong (van den Berg 1968: 298). On the other hand, the whispery effect may be only slight, if transglottal airflow is small. Chiba and Kajiyama (1958: 28) report a finding that supports this:

It is found that the edges of the vocal chords remain covered here and there with small lumps of mucus, which means that the air is not exhaled abruptly. (In the chest register, especially in 'sharp voice', the small lumps of mucus on the edges of the vocal chords are blown away as soon as the voice starts.)

(Chiba and Kajiyama's 'sharp voice' is discussed in the next chapter, on overall tension settings.) Laryngograms of falsetto, whispery falsetto, creaky falsetto and whispery creaky falsetto are included in Figure 17. A spectrogram of falsetto is shown in Figure 18.

Falsetto is characterized acoustically by a number of factors: the first is that the fundamental frequency tends to be considerably higher than in modal voice. The pitch-control mechanism is different from that in modal voice; van den Berg (1968: 298) writes that

In chest voice the passive tension in the vocal ligaments needs to remain small when the active tension in the vocalis muscles is increased to attain the highest pitches. In falsetto voice, however, the active tension in the vocalis muscles needs to remain small when the passive tension in the vocal ligaments is increased [by the cricothyroid muscle -- J.L.] to attain the highest pitches. The registers overlap in the region of medium pitches.

Hollien and Michel (1968: 602) found that the average pitch-range for male falsetto was 275-634 Hz, as against the average range for modal voice, which was 94-287 Hz.

UKT: Mid-point of falsetto (405 Hz) compared to mid-point modal voice (191 Hz), shows that the frequency of falsetto is about two times higher.

The second acoustic characteristic derives from the interaction of high fundamental frequency and the mode of vibration of the vocal folds, Zemlin writes that

High speed motion pictures of the larynx during falsetto production reveals that the folds vibrate and come into contact only at the free borders, and that the remainder of the folds remain relatively fixed. Further, the folds appear long, stiff and very thin along the edges ... The quality of tone produced by falsetto is almost flute-like in nature. This is partly due to the rather simple form of the vibration executed by the vocal folds, and partly due to the high rate of vibration ... when the fundamental frequency is very high, the harmonically-related overtones are widely separated in frequency, and consequently in any given frequency range there will be fewer components in the sound produced than there is in a voice with a lower fundamental frequency. This partly accounts for the rich quality of the bass voice when compared with the 'thin' quality of the tenor voice. (Zemlin 1964: 155).

The third acoustic characteristic is that the slope of the spectrum of the laryngeal waveform is much steeper than for the modal voice, falling at about - 20dB per octave (Monsen and Engebretson 1977: 988). Also, whereas the spectrum of modal falls off more steeply with increasing frequency, falsetto seems to have a more regular decrement (Monsen and Engebretson, loc. cit.). Finally, while modal voice has a closing portion of the laryngeal waveform as the more abrupt component, in falsetto it is the opening portion that is steeper (Monsen and Engebreston loc. cit.).

In the Hanson experiment mentioned in the Introduction, the general finding was that less efficient modes of phonation (such as falsetto, and compound creak phonations) all have a greater spectral slope than modal voice, and that as fundamental frequency rises within a given phonation type, the spectral slope becomes steeper (Hanson, personal communication).

Falsetto is a phonation type that does not seem to be exploited for linguistic purposes. But it frequently has a paralinguistic function, governed by conventional usage, in a variety of cultures. In Tzeltal, a Mayan language of Mexico, speakers 'use sustained falsetto as [an] honorific feature; it is enjoined in greeting formulae, and may spread over an entire formal interaction' (Brown and Levinson 1978: 272).

UKT: There are three environments in which we change the way in which we "speak":
1. when we speak or call out to our pets
2. when a Burmese-Buddhist monk or nun, or a layman delivers a religious sermon
-- {thän-wé-ga. thän} (commiserative tone?)
3. when a female-nat medium sings a lamentation (about the tragic end of the human life of the person who has become a nat or spirit) -- {nat.thu-ngèý thän}. This 3rd type is what I think should be called 'falsetto" -- to check with my peers.

Contents of this page

03.01.03. Whisper

UKT note: In the section on Breathiness, 03.01.06, J. Laver notes that some writers describe voice qualities that should have been called 'whispery' as 'breathiness' , and terms such as harsh breathy voice should be called harsh whispery voice.
• The compound phonation whispery voice (whisper + modal voice) is termed murmur by Ladefoged. See Compound phonation types.

The physiology of the whisper setting of the phonatory mechanism is not controversial. Nearly all writers agree that the chief physiological  characteristic of whisper is a triangular opening of the cartilaginous glottis, comprising about a third of the full length of the glottis (Pressman 1942). The shape of the glottis in whispering is often referred to as an inverted letter Y. (Luchsinger and Arnold 1965: 119). In weak whisper, the triangular opening can be fairly long, including part of the ligamental as well as the cartilaginous glottis. With increasing intensity, the glottis is increasingly constricted until only the cartilaginous section remains just open. Taken together, these factors suggest low adductive tension, and moderate to high medial compression.

The triangular opening of the glottis is achieved by the following factors: the lateral cricoarytenoid muscles contract, 'toe-ing in' the vocal processes of the arytenoid cartilages (Zemlin 1964: 169). The muscles which normally approximate the bodies of the arytenoid, the interarytenoid muscles, remain relaxed ( Heffner 1950: 20). This facilitates the toe-ing of the arytenoids as they pivot on the cricoid. As the air flows past the edges of the open cartilaginous glottis, the characteristic 'whisper' sound quality is produced by 'eddies generated by friction of the air in and above the larynx' (van den Ber 1968:297).

The whisper setting is a very uneconomical use of airflow (Luchsinger and Arnold 1965: 119; Zemlin 1964: 169). Catford describes the aerodynamic and acoustic aspects of whisper in the following terms

Glottis constricted (estimated area, from the smallest possible chink up to about 25% of maximal glottal area). Critical rate of flow about 2.5 cl/sec, estimated critical velocity about 1900 cm/sec. Maximum rate of flow about 500 cl/sec. Turbulent flow, with production of high-velocity jet into pharynx. Acoustic spectrum similar to breath but with considerably more concentration of acoustic energy into formant-like bands. Auditory effect: a relatively 'rich' hushing sound (Catford 1964: 31)

When whisper combines with another laryngeal setting such as modal voice or falsetto, to give compound phonations of whispery voice or whispery falsetto, then there is necessarily a greater amount of interharmonic noise than in the simple phonations of modal voice or falsetto alone (Hanson, personal communication).

Laryngograms of whispery voice, whispery falsetto and whispery creaky falsetto can be seen in Figure 17. A spectrogram of whispery voice is shown in Figure 18.

Whisper does not seem to be used for contrastive linguistic purposes. However, it is phonetically characteristic of the utterance-final devoicing process in many languages, including English and French. Ladefoged cites Doke (1931) as reporting that the use of whisper as 'a -prosody associated with the otherwise voiced sounds in final syllables ... is common in the Bantu family' (Ladefoged 1971: 16). Whisper is also found in a similar role in English, in juxtapositional assimilation. Partial regressive assimilation of voiceless often results in an otherwise voiced consonant being whispered, as in the phrase <his son> pronounced with the final consonant of <his> with the vocal folds in a whisper position (Abercrombie 1967: 137).

The use of whisper in a paralinguistic function is very widespread. In English, and perhaps in the vast majority of cultures, to whisper is to signal secrecy or confidentiality.

Contents of this page

03.01.04. Creak

UKT: Info I have collected:
• Synonym for 'creak' is 'laryngealization'
• Creaky voice, is a compound phonation of modal voice and creak.
• Ladefoged ... on West African languages ... 'laryngealized voice' ... equates 'creaky voice'. -- see 03.02. Compound phonation types

Creak is also called vocal fry or glottal fry in the phonetic literature, particularly by American researchers.

Catford gives the following details

Low frequency (down to about 40 cps) periodic vibrations of a small section of the vocal folds. Mean rates of flow very low -- of the order 1.25 to 2 cl/sec. The precise physiological mechanism of creak is unknown, but only a very small section of the ligamental glottis, near the thyroid end, is involved. The auditory effect is of a rapid series of taps, like a stick being run along a railing. (Catford 1964: 32)

The low fundamental frequency of this creak type of phonation is one factor that distinguishes it from harsh voice, which is otherwise somewhat similar. While the mean fundamental frequency for creak has been found to be 34.6 Hz, in an average range for male speakers of 24-52 Hz, the mean fundamental frequency for harsh voice is said to be 122.1 Hz, with a range similar to that of modal voice, whose average range is 94-287 Hz, according to Michel and Hollien (1968), and Michel (1964). Michel (1968) also reports that harsh voices seem to have fundamental frequencies consistently above 100 Hz, and vocal fry (i.e. creak) consistently below 100 Hz. It is fundamental frequency characteristics of this sort that led Hollien, Moore, Wendahl and Michel (1966: 246) to suggest that vocal fry is 'best best described as a phonational register occurring at frequencies below those of the modal register'.

The specification of fundamental frequency characteristics is not, of course, enough. Comment on laryngeal factors contributing to auditory quality is also needed. These factors are described by Hollien, Moore, Wendahl and Michel (1966: 247) as follows:

(1) the vocal folds when adducted are relatively thick and apparently compressed,
(2) the ventricular folds are somewhat adducted also, and
(3) the inferior surfaces of the false fold actually come in contact with the superior surfaces of the true vocal folds.
Thus, an unusually thick, compact (but not necessarily tense) structure is created prior to the initiation of phonation. Under these conditions it might be expected that the false vocal folds would vibrate in synchrony with the true folds. However, since there is no evidence to support this conjecture, it is possible that their position is either the incidental result of basic laryngeal adjustments or serves to produce a damping of the vocal fold movement. It would be predicted also that vibration is initiated and maintained by relatively low subglottal pressures; that airflow, if measured, would be considerably less than for most other phonational events.

A number of experimental studies support their hypotheses. Moore (1971), on the basis of frontal stroboscopic laminagrams, suggests that

vocal fry may ... be produced when the mass of the vibrations is increase by the collaboration of the ventricular folds. These structures appear in x-ray photographs to combine the ventricular folds functionally with the vocal folds to form massive bilateral vibrators that move with relative small amplitude. It is presumed that this mechanism is capable of both impeding the flow of air, even when there is considerable pressure, and of releasing a series of pulses in which the channel is open for relatively short portions of the cycle. (Moore 1971: 72)

Fónagy (1962) investigated the influence of affective states on the mode of phonation, by means of laryngoscopy, tomography and asymmetrical radiography, and described what he called the 'creaky' voice of 'suppressed rage' as having ventricular folds pressed hard against each other, the ventricle of Morgagni wrinkled, the vocal folds held tightly together and the air column vertically through the larynx narrowed to a line -- all matching the picture suggested Hollien and his co-workers of phonation with strong adductive tension and medial compression, but little longitudinal tension, and with vigorous ventricular involvement.

More recent work by Hollien, Dansté and Murry (1969), has provided some support also, in their finding that the control of fundamental frequency in vocal fry is not achieved by the same mechanism as in the modal voice: while vocal fold length in modal voice increases with fundamental frequency, and vocal fold thickness is inversely related to the frequency (Hollien and Michel 1968), in vocal fry neither the length nor the thickness of the vocal folds seem to vary with changes in pitch. This suggests that control of fundamental frequency is managed by the aerodynamic component of the aerodynamic-myoelastic phonatory action, rather than the myoelastic component, and the sub-glottal air pressure should reflect this. This has not yet been established experimentally, although Murry and Brown (1971) have shown that, consistent with the hypothesis of Hollien et al. mentioned above, the overall sub-glottal air pressure in vocal fry is 'always less than that for the modal phonation' (Murry and Brown 1971: 446); McGlone (1967) and Murry (1969) have found lower airflow values for vocal fry than for modal voice.

Monsen and Engebretson (1977: 989) give an account of 'creaky voice', which for them is synonymous with 'vocal fry', in broadly similar terms

The fundamental frequency of creaky voice ranges from 30 to 90 Hz. Because of the way it is produced (slack vocal folds and low subglottal air pressure), the period-to-period variations in fundamental frequency are quite high. In the extreme, one period may last 33 msec, followed by one of 11 msec and then one of 18 msec. The glottal waveform of creaky voice is thus highly irregular ... The glottal spectrum of creaky voice falls off less steeply than that of all other glottal samples, but because of extremely low fundamental frequency, the overall distribution of energy associated with normal voice phonation is maintained.

A different aspect of creak, or vocal fry, is the 'auditory effect ... of a rapid series of taps, like a stick being run along a railing' that Catford (1964) mentions in the passage quoted at the beginning of this section on creak. The effect of continual, separate taps in rapid sequence is an essential part of the characteristic auditory quality of creak. Hollien and Wendahl (1968: 506) have described the acoustic correlate of this effect as 'a train of discrete excitations or pulses produced by the larynx', using 'pulse' to mean 'any of a variety of glottal waveforms of brief duration separated by varying periods of no excitation'. In this connection, Wendahl, Moore and Hollien (1963: 254) have suggested that 'the primary criterion which must be met in order for the signal to be perceived as vocal fry is that the vocal tract be highly damped between glottal excitations'. They also note that the fundamental frequency of vocal fry can vary between 20 and 90 Hz, and still be heard as vocal fry provided that the vocal tract is nearly completely damped in between the occurrence of successive wave-fronts. Coleman (1963) specifies that the vocal fry is perceived whenever the vocal tract wave is allowed to decay by 42 to 44 dB of its maximum amplitude for a single pulse, and when the wave is allowed to decay by only 30 dB between the excitation pulses, vocal fry is not perceived. This criterion also allows for the possibility not only of trains of single, discrete pulses, but also for a sub-category of vocal fry where there is 'a vibratory pattern in which the vocal cords separate twice in quick succession and then approximate firmly in a relatively long closed phase'; this double-pulse train was discovered in an investigation using very high-speed cinefilm of the vibrating glottis, and was given the name 'dicrotic dysphonia' by the investigators (Moore and von Leden 1958: 235). Monsen and Engebretson (1977: 989) also refers to the possibility of 'double-pulse' phonation of this sort. Double-pulsing can be seen in the laryngogram of creak that is included in Figure 17. (Laryngograms of creaky falsetto and whispery creaky falsetto can also be seen in the same figure.) Hollien and Wendahl (1968: 509) have also suggested the possibility of triple-pulse trains in vocal fry.

The perceptual necessity of damped laryngeal pulses gives quite strong plausibility to the suggestion by Hollien, Moore, Wehdahl and Michel (1966: 247) noted earlier, that the function of ventricular folds coming into contact with the surfaces of the vocal folds may be to damp the movements of the vocal folds: damping of this sort would also have the effect, observed in creak, of elongating the closed phase of each vibratory cycle (Moore 1971: 72)

This damped aspect of vocal fry also lends credibility to the comment reported in the previous chapter on velopharyngeal settings, made by Van Riper and Irwin (1958: 244) to the effect that they 'suspect that some of what bas been tern "nasal twang" is merely the presence of glottal fry'. The validity of their remark resides in the fact that nasality shares what is arguably one of the most important characteristics with vocal fry -- that of damped vocal system.

An example of the low-frequency, discrete pulses of creak can be seen in the spectrogram shown in Figure 18.

Laryngealization

UKT: a synonym for creak (or creaky voice). I have broken up the original pdf section, giving 'Larygealization', a synonym for creaky voice, to pinpoint what the Western linguists meant by the "creaky voice" in Burmese. The usual examples given by Western linguists, http://en.wikipedia.org/wiki/Burmese_language (download 071027) are as follows (I have given the Burmese-Myanmar orthography, and IPA transcripts in suprasegmentals):
  • Low /kʰà/ "shake" -- {hka} (MEDict054) /kʰa/
  • High /kʰá/ "be bitter" -- {hka:} (MEDict055) /kʰaː/
  • Creaky /kʰa̰/"fee" -- {hka.} - (MEDict051) /kʰă/ (from {a.hka.} MEDict543)
  • Checked /kʰaʔ/"draw off" -- {hkap} (MEDict065)
and,
  • Low /kʰàN/"undergo" -- {hkän} (MEDict058)
  • High /kʰáN/ "dry up" -- {hkan:} (MEDict065)
  • Creaky /kʰa̰N/ "appoint" -- {hkan.} (MEDict065)
What they call "creaky tone" are the following which I have identified:
  • Creaky /kʰa̰/"fee" -- {hka.} (MEDict051). It is #1 of the group {hka. hka hka:}
  • Creaky /kʰa̰N/"appoint" -- {hkan.} . It is #1 of the group {hkan. hkan hkan:}
It appears that their emphasis is on the "sudden stoppage of sound" produced way back in the throat". Notice that in {hkan.}, an {a.that} is involved as in {hkap} ({hkap} rymes with <cup>, with the vowel sound /ʌ/.
• Ladefoged ... on West African languages ... 'laryngealized voice' ... equates 'creaky voice'. (Ladefoged 1964: 16)

A term often used in the linguistic literature as a synonym for creak (and creaky voice, a compound phonation blending modal voice and creak, discussed in more detail below), is 'laryngealization'. Ladefoged writes that

Another mode of vibration of the vocal cords occurs in laryngealized sounds. In this type of phonation the arytenoid cartilages are pressed inward so that the posterior portion of the vocal cords are held together and only the anterior (ligamental) portions are able to vibrate. The result is often a harsh sound with a comparatively low pitch. It is also known as vocal fry and creaky voice. (Ladefoged 1971: 14-15)

Ladefoged makes it clear that in his view only a small length of the ligamental glottis is in vibration (1971: 8), and also indicates that while a distinction may be drawn between creak and creaky voice, for his own linguistically-motivated purposes such a distinction is not necessary (1971: 15). With allowance for these two points, Ladefoged's description of laryngealization is in accord with the description offered here of creak (rather than of creaky voice). Laryngealization plays a phonological role in many languages, including Arabic, Chadic and Nilotic languages (Ladefoged 1971: 15, 42). In Danish, 'two such words as hun <she> and hund <dog> are pronounced alike except for a difference of register, the second having creaky voice' (Abercrombie 1967: 101. In many tone languages, syllables with low or falling tones are phonetically characterized by creak or creaky voice. In English, in the paralinguistic regulation of interaction (Laver 1976: 351, speakers of Received Pronunciation often use creak or creaky voice, simultaneously with a low falling intonation, as a signal of completion of their turn as speaker, yielding the floor to the listener. When used throughout an utterance, creaky voice signals bored resignation, in the paralinguistic conventions of English. In Tzeltal, the Mayan language mentioned earlier, creaky voice is used paralinguistically 'to express commiseration and complaint, and to invite commiseration' (Brown and Levinson 1978: 272).

Glottalization

Another term in the linguistic literature which has a partial overlap with 'creak' and 'creaky voice' is 'glottalization'. However, this has been used as a cover term for such a wide variety of other phenomena as well, such as 'ejective, implosives, laryngealized sounds, and pulmonic articulations accompanied by glottal stops (Ladefoged 1971: 28), that is probably best to disregard it here, except to note one salient principle held in common by the reference of 'laryngealization' and 'glottalization'. That is, whatever else they may refer to, both terms suggest a tendency to constriction at the laryngeal level. This is strongly in keeping with the general laryngeal characterization of creak and creaky voice offered here, where the glottis is subjected to strong adductive tension and medial compression, with vigorous ventricular involvement.

Contents of this page

03.01.05. Harshness

It will be recalled that 'harshness' is a quality taken on by a number of other phonation types. It will be discussed here as modification of modal voice, for convenience of exposition. Applied to modal voice, harshness should be thought of not as contributing substantially new parameters to the mode of phonation, but rather as booting the values of some of the parameters already operating. We shall return to this in a moment.

A number of writers have given auditory descriptions of the quality associated with harsh voice. Sherman and Linke (1952) call it 'an unpleasant, rough, rasping sound;; Holmes (1932) says it is a 'raucous voice quality'; Milisen (1957) writes that harsh voice is a 'rasping sound associated with excessive approximation of the vocal folds'; Van Riper (1954) calls it 'strident'. The widely used label for this quality, 'harshness', seems well-chosen.

The acoustic characteristic of harsh voice are concerned chiefly with irregularity of the glottal wave-form and spectral noise. Fairbanks (1960) said that 'Irregular, aperiodic noise in the vocal fold spectrum is the distinguishing feature of harshness.' Michel (1964) also says that 'harsh voices are characterized by aperiodicity or noise in the spectrum, a normal fundamental frequency level and larger than normal perturbations about the mean fundamental frequency'. The view that small cycle-to-cycle variations in fundamental frequency are associated with voices judged to be harsh is supported by Coleman (1960), Moore (1962), Thompson (1962) and Wendahl (1963). Wendahl (1964) carried out an acoustic analysis of the role of amplitude variations of the laryngeal waveform in voices judged to be harsh, and found that successive wavefronts tend to be of unequal amplitude. Using LADIC, an electric laryngeal analogue for producing synthetic waveforms, he established that these characteristic amplitude irregularities made a significant contribution the perception of harsh 'roughness' (Wendahl 1964).

The predominant characteristic is the aperiodicity of the fundamental frequency, which is heard as a component of auditory quality rather than auditory pitch. This aperiodicity has been referred to as a pitch 'jitter' (Cooper, Peterson, and Fahringer 1957: 183). Listeners are very sensitive to even very small amounts of such jitter. Wendahl (1963) used LAD IC in his investigations of laryngeal waveform irregularity to establish the contribution of pitch jitter to harshness. He presented listeners with synthetic stimuli which
[UKT: there is a line break here which I think is the mistake of the typist copying from the original book into pdf file]
varied in the magnitude of frequency differences between successive cycles. For each of two median fundamental frequencies, a 100 and a 200 cps condition, stimuli were generated to have frequency variations on successive cycles of +/- 10 cps, 8 cps, 6 cps, 4 cps, 2 cps, and 1 cps. 535 listeners judged the stimuli on the basis of which sounded the most rough. The results show that even very slight frequency variations, as little as +/- 1 cps around a median fundamental frequency of 100 cps, sounded rough. (Wendahl 1963: 248)

Wendahl also showed that greater auditory roughness was related to greater deviations from the fundamental frequency, and that the same absolute amount of deviation sounded less rough when superimposed on a fundamental of higher frequency; so if the 100 Hz median frequency were taken to represent a male voice and the 200 Hz median a female one, then the same deviation, say +/- 5 Hz, would make the male voice sound rougher (op. Cit. Pp. 248-9). The same principle underlies Hess's finding that harshness in higher-pitched voices is judged as less severe than in lower-pitched ones (Hess 1959). This may help to explain the impression that harsh voice is heard much less commonly in women than in men.

Coleman and Wendahl (1967) also used LADIC to make synthetic stimuli in an experiment investigating the relationship between the proportion of pitch-jitter present in a stimulus and the degree of perceived roughness. They found that:

As the relative duration of jitter elements within a signal is increased, listeners will evaluate the signal as increasing in roughness. It makes little, if any difference to listener whether jitter segments occur at the beginning or end of stimuli. A large jitter signal of short duration may be judged to be less rough than a small jitter signal or less jitter excursion [i.e. less degree of frequency deviations from the median -- J.L] but having longer duration within a stimulus. (Coleman and Wendahl 1967: 92)

The relevance of this last study, which will be commented on in more detail in a moment, lies in the relationship between segmental pronunciation and the everyday perception of voice quality. A number of researchers have found that the judged severity of harshness is correlated with some variables of segmental articulation. Rees (1958), for example, found that harshness on vowels was judged to increase with the openness of the vowel; to be greater when the vowel is in a voiced environment; and more marked on vowels in isolation when initiated with a glottal stop than with a 'soft', 'aspirated' beginning. In connection with this last comment, Craig and Sokolowsky (1945) said that excessive and continuous use of a 'glottal attack' on vowels gives a person's speech a characteristic harsh quality. Van Riper and Irwin (1958: 232) agree with this, when discussing harshness as a functional disorder of the voice: 'Very characteristic of this disorder ... is the manner of vocal attack. Glottal catches and stops are common. Vocalization is sudden.' Sherman and Linke (1952) suggest that harshness is judged to be more severe with greater duration of the harsh utterance. Linke (1953) came to the same conclusion as Rees (1958) when she showed that high vowels are judged as less harsh than open vowels; she also showed that, as one would predict, lax vowels are less harsh than their tense counter parts. Although Coleman and Wendahl used synthetic stimuli (very necessary for precise control of the stimulus variables), their results are nevertheless of direct relevance to the perception of voice quality in the normal situation of spoken interaction. Combining the results of the study by Coleman and Wendahl (1967) with those of Rees (1958), Sherman and Linke (1952) and Linke (1953), we can conclude that harshness in ordinary voices will be of intermittent occurrence, and of variable relative severity depending on the nature, context and duration of the segments involved. The example of the perception of harshness here can be extrapolated to the general perception of voice quality. This is to say that we perceive voice quality by attending to signals of differing duration and auditory prominence, distributed intermittently and irregularly through the stream of speech.

One physiological correlate of harshness is widely agreed. It is laryngeal tension, which underlies what Milisen (1957), quoted above, describes as 'excessive approximation of the vocal folds'. Gray and Wise (1959: 52), for example, say that harshness 'results from overtensions in the throat and neck; it is often if not usually accompanied by hypertensions of the whole body'. Van Riper and Irwin (1958: 232) write that in the case of speakers with harsh voices 'Most of these individuals show marked hypertension both of the (larynx) and of ... the pharynx. Both the suprahyoids and the infrahyoids tend to be strongly contracted, as palpation will demonstrate.' The quote Russell (1936), to the effect that 'as the voice begins to become strident and blatant, one see the red-surfaced muscles which lie above the vocal cords begin to form a tense channel'. The add that

Most harsh voices are relatively low in pitch, with the average pitch level close to the bottom of the range. The intensity appears louder than in the normal voice, though some of the apparent loudness may come from resonation effects due to the tenseness of the oral and pharyngeal cavities. (Van Riper and Irwin 1958: 232)

Zemlin (1964) agrees with Van Riper and Irwin, when he writes that

the distinguishable feature which differentiates the normal from the harsh voice is aperiodic noise of aperiodic vocal fold vibration. Such vocal fold vibration may well be due to excessive tension in the folds. Support for this line of reasoning comes from the fact that persons with harsh voices tend to initiate phonation with glottal attacks. There is some evidence to suggest that persons with harsh voice quality are phonating at an inappropriate pitch level, usually slightly low for their for their vocal mechanism. (Zemlin 1964: 165)

Kaplan (1960) is quite clear about the responsibility of laryngeal tension for harshness. He says that

where the folds are drawn too tightly together during phonation rather than being lax, a shrill, harsh, creaking noise, which is called stridency, or stridor, enters the tone. An obstruction of some type is present. Some causes include general tension, spastic paralysis, or often a throat strain or 'piched throat'. There is excessive constriction of muscles all through the vocal tract, and the tension is great in external laryngeal muscles. The vibrations of the vocal chords are hindered, and supraglottal friction noises are introduced. (Kaplan 1960: 167-8)

Accepting laryngeal tension as established, the question arises as to which type of tension, in the categories set up earlier of adductive tension, medial compression and longitudinal tension, is involved. Michel (1968) was reported, in the earlier section on creak, as stating that while vocal fry (creak) was characterized by fundamental frequencies consistently below 100 Hz, harshness showed fundamental frequencies consistently (but not markedly) above 100 Hz. This strongly suggests that the laryngeal tension in harshness is not chiefly longitudinal tension, the main mechanism in modal voice for controlling the frequency of vibration of the vocal folds. From the comments noted above, such as 'excessive approximation of the vocal folds', and 'the folds are drawn too tightly together', it seems reasonable to conclude that the exaggerated laryngeal tension in harsh voice is a combination of extreme adductive tension and extreme medial compression, brought about by over-contraction of the muscle systems responsible for these two parameters in modal voice. This is supported by Brackett (1940), cited by Van Riper and Irwin (1958: 232), who describes the inflammation of the vocal folds which results from their traumatic abuse by the deliberate, experimental production of harsh voice.

UKT: I have given Fig 16 a second time to show the tensions mentioned above.

Ventricular voice

Earlier in this chapter, it was suggested that when harshness became very severe, the ventricular folds become involved in phonation, pressing down on the upper surface of the true vocal folds. Ventricular voice was offered as a physiologically more explicit synonym for severely harsh voice. It may be helpful to end this section on harshness with some brief comments about this mode of phonation. We can note, initially, that ventricular voice involves considerably greater tension of the ventricular folds than occurs in the ventricular participation in creak mentioned earlier.

Van den Berg (1955) says that 'hash, metallic voice is made ... when the ventricular folds withdraw into the adjacent tissue, leaving almost no space in the ventricles'. He then goes on to discuss the way that this setting of the larynx serves to boost the relative amplitude of the higher harmonics: spectral features of this sort are summarized in the next chapter, on overall tension settings of the vocal system, and will not be considered further at this point.

Friederickson and Ward (1962), in an article about the possibilities of damaging the larynx by strenuous muscular exertion, say in in pronounced physical effort, ' the true cords are no longer completely approximated, while the false cords remain competent'. Under these circumstances 'the full force of the intralaryngeal pressure is exerted at the ventricular level'. Ventricular voice can be visualized, then, as phonation at extreme effort, with a fine degree of control over the audible quality made impossible by the comparatively large muscular forces exerted. Plotkin (1964) says that ventricular voice, 'once heard is never forgotten', and that the 'characteristic deep, hoarse voice, alike in male and female, causes an almost sympathetic tightening of the listener's throat'. Freud (1962), quoted by Aronson, Peterson and Litin (1964), gives a rather similar picture, of 'ventricular dysphonia', where he 'depicts it as a tonal, tight, spastic apposition of the constrictors of the larynx and hypopharynx, giving the voice a groaning, animal quality and suggesting to the listener the exertion of extreme effort. The words sound as if they are being chopped off' (Aronson, Peterson and Litin 1964: 369).

These descriptions by Frederickson and Ward, and by Freud, are of course descriptions of voices which have been classed as 'dysphonic', and it does not follow that all voices phonetically classifiable as examples of ventricular voice would use ventricular phonatory effort of quite such extreme degrees. One can hear voices which make use of some contribution by the ventricular folds fairly commonly in everyday life. Sweet noted a quality he called 'the pig's whistle' effect, which he said gave a 'a wheezy character to the voice' and which he suggested arose from a 'narrowing of the upper glottis' (Sweet 1877: 97-9) : he said that 'it may be heard from Scotchmen [sic], and combined with high key gives the pronunciation of the Saxon Germans its peculiarly harsh character' (Sweet 1906: 73). In terms offered here, this quality would be called ventricular voice, or possibly whispery ventricular voice. It is still frequently heard as a component in the voice quality characterizing some urban Scots accents. A spectrogram of harsh voice is shown in Figure. Harsh voice and ventricular voice are both used in English as paralinguistic signals of anger. It seems not implausible to suggest a conventional scalar relationship between the degree of harshness and the degree of anger expressed. In this sense, ventricular voice is a signal of more extreme anger, being also correlated with more extreme muscular tension.

Contents of this page

03.01.06. Breathiness

UKT: Notable info in the following:
• 'Breathiness ... a modification of modal voice, giving breathy voice.'
• Phonation type of 'voiced h' -- (Catford 1977: 99)
• High pitched breathy voices seem rare.
   Since, 'The modal phonation of a male speaker occurs at an average of 120 Hz, while for a female speaker it is approx. 220 Hz. http://www.ims.uni-stuttgart.de/phonetik/EGG/page10.htm (download 071028), we can conclude that breathiness in female speakers is rare.
• Breathiness can combine with only one other type of phonation -- modal voice.
• Some writers describe voice qualities that should have been called 'whispery' as 'breathiness'
• Paralinguistically, breathy voice is exploited in English for the communication of intimacy.

UKT: Info I have collected:
• Many Austroasiatic languages of the Mon-Khmer family (Vietnamese, Khmer, Muong, Mon, Khasi, Khmu, and Wa) found on mainland Southeast Asia distinguish two voice registers, a breathy, or “sepulchral,” voice (made by relaxing the vocal cords) and a clear voice (made by tensing the vocal cords).
-- http://www.britannica.com/eb/topic-388711/Mon-Khmer-languages 071120

'Breathiness' is a quality which is quite often heard as a modification of modal voice, giving breathy voice. By comparison with modal voice, the mode of vibration of the vocal folds is inefficient, and is accompanied by slight audible friction. Muscular effort is low, with the result that the glottis is kept somewhat open along most of its length, and the folds never meet on the midline. Because each closing movement of the folds tends to be abortive, the lessened glottal resistance leads to a higher rate of airflow than in modal voice.

Catford describes the characteristics of breathy voice as follows

the sound of voice mixed in with breath. The effect is somewhat like that of sighing. This is breathy voice: the glottis is narrowed from its most open position, but not narrowed enough to generate whisper, that is, it is still at considerably more than 25 percent of its maximal opening, probably, in fact, around 30 to 40 percent. The vocal folds are vibrating, but without ever closing or, indeed, coming anywhere near closing. They simply 'flap in the breeze' of the high velocity airflow. The liminal volume-velocity for the production of breathy voice is of the order of 80 to 100 mm3/s; more commonly, however, it is much faster, around 900 to 1000 cm3/s. (Catford 1977: 99)

It is clear that the notion of 'breathy voice' thus involves a type of phonation which can be produced on a very wide range of airflow. Catford takes the position that most examples of what he would classify as breathy voice use a very high rate of airflow. He suggests, for example, that one should

try filling the lungs to capacity and then generating breathy voice for as long as you can. Owing to the high volume-velocity of breathy voice, this probably will not be more than four or five seconds. Breathy voice often occurs when one tries to blurt out a message when extremely out of breath. It is also the phonation type of 'voiced h' [sic] (Catford 1977: 99).

While agreeing with Catford that breathy voice produced at the lower end of the range of airflow has enough auditorily in common with breathy voice produced at a very high rate of airflow to justify the use of the same identifying label for such voices, the position is taken here that in normal speaking situations, most examples of breathy voice will use airflow rates from the lower end of the range. Most speakers with breathy voices will thus be able to speak relatively continuously, without needing to pause every four to five seconds to draw breath, unlike the speakers Catford describes who use flow-rates of up to 1000 cm3/s.

The muscle tension adjustments necessary for breathy voice can be seen as involving minimal adductive tension and weak medial compression, just sufficient to allow aerodynamic forces in the comparatively large volume of transglottal airflow to superimpose on the out-flowing air a very inefficient vibration of the vocal folds, with the folds not meeting at the centre line. The one laryngeal tension factor that is controlled more finely is longitudinal tension, in the production of appropriate variations of fundamental frequency for the purposes of intonation. We can assume that the degree of longitudinal tension is rather low, generally. High pitched breathy voices seem rare. Fairbanks (1960: 179) comments that 'Breathy quality is almost invariably accompanied by limited vocal intensity [and[ low pitch.' A laryngogram of breathy voice can be seen in Figure 17, and a spectrogram in Figure 18.

Breathiness can combine with only one other type of phonation, in the system of describing voice quality here: that is, modal voice. This is because, while modal voice requires only moderate compression, all the others, falsetto, whisper, creak, harshness and ventricular voice, need a greater amount than is compatible with breathiness.

Many writers have used the label 'breath' to describe components in given voice qualities that should rather have been called 'whispery'. In the descriptive scheme used here it would not be possible, for example, to accept a label which combined 'breathiness' and 'harshness', such as harsh breathy voice, for the voice quality often described as 'husky' or 'hoarse', because of the mutually exclusive prerequites of breathiness as here defined and harshness. Such a quality would instead be labelled harsh whispery voice.

However, it is reasonable to acknowledge that there is a close auditory relationship between breathy voice and whispery voice, as these two compound types of phonation are understood in this descriptive scheme. Both involve the presence of audible friction: to the extent that such friction is concerned, the transition from breathiness to whisperiness is part of an auditory continuum, and the placing of boderline between the two categories is merely an operational decision. The physiological relationship between the two is a good deal more distant, however, when their specification in terms of the muscular parameter of medial compression is considered, as indicated immediately above. Breathy voice has extremely weak medial compression, with little tendency on the part of the lateral cricoarytenoid muscles to swivel the vocal processed of the arytenoids in towards each other. The whisper component of whispery voice requires moderate to high compression, so that the whisper-producing channel is relatively confined to the cartilaginous glottis.

From an auditory point of view, it is practical to use the label 'breathy voice' for the range of qualities produced with a low degree of laryngeal effort, and where only a slight amount of glottal friction is audible. If one thinks of the friction component and the modal voice component as being audibly co-present but able to be heard individually, then the balance between the two components in breathy voice is one where the modal voice element is markedly dominant. 'Whispery voice' can then be used for phonations produced with a greater degree of laryngeal effort, and where a more substantial amount of glottal friction, from a more constricted glottis, is audible. The audible balance between the friction component and the periodic component is different from that in breathy voice: the friction component is more prominent than in breathy voice, and may on occasion even equal the periodic component, (and sometimes dominate it strongly, as in 'extremely whispery voice'). In the interpretation offered here, the friction component of whispery voice can thus be subdivided into a larger number of audible increments than can the friction component of breathy voice.

It is perhaps unfortunate that some writers simply collapse the two phenomena. Kaplan (1960: 167), for example, describes 'breathiness' as a voice quality 'said to have an aspirate quality, and the effect is as though a 'whisper' were added to the normal tone'. In this and in other cases, although <whisper> is a term in the writers' descriptive repertoire for voice quality, it is not used for the description of a compound type of phonation with (modal) voice or any other type, and <breathy voice> is the label used for any quality where there is a fricative escape of air during phonation. Zemlin writes, for example, that

The most common correlate of Breathiness is a persistent glottal chink in the posterior portion of the vocal folds. Critical examination of a large number of larynxes reveals that a good many persons with apparently healthy, normal sounding voices display a glottal chink in the area of the arytenoid cartilages. We can suppose that there is a point at which the magnitude of the glottal chink will result in a breathy voice quality. The exact relationship between magnitude of glottal chink and voice quality is not well understood. (Zemlin 1964: 165)

In the situation that Zemlin describes, it would seem likely that as the 'glottal chink' grew in size, whispery voice would set in first, and that it would have to enlarge to a much greater proportion of the total possible glottal area before breathy voice either as described here or by Catford was heard.

With the considerable amount of air that is wasted in breathy voice, there is an inverse relationship between intensity of the voice and breathiness (Pronovost 1942). Some of the acoustic energy would also be lost by the damping effect of the general relaxation of the muscles of the whole vocal system in lax voice, of which breathy voice is almost always a component. This damping effect is discussed in the next chapter on tension settings of the vocal system. Breathy voice, however, contributes its own damping effect to the general energy loss. Fant (1972: 50) points out that the broadening of the bandwidth of the first formant in lax voice can be partly attributed to the high damping effect of 'weak, breathy voice' on the rest of the vocal system.

Breathy voice does not seem to be used phonologically as often as whispery voice. Paralinguistically, however, breathy voice is exploited in English for the communication of intimacy.

Contents of this page

03.02. Compound phonation types

A number of combinatorial constraints on the co-occurrence of individual phonation types to form a compound type have already been mentioned. In this section, we shall look briefly at some of the factors underlying the compatibility, or the lack of it, between different individual phonation types, with respect to their potential co-occurrence. There are two general conditions under which compatibility between phonation types is possible.

The first condition is where the individual settings apply to different parts of the laryngeal structure, so that competition for the same vocal apparatus is avoided.

The second condition is where the same part of the laryngeal apparatus is concerned in the production of two different phonation types, but where the vibratory patterns of the two settings modify each other without either changing substantially enough to lose its auditory identifiability.

Examples of the first condition for compatibility, where different laryngeal locations are involved, would be whispery voice, whispery falsetto, and whispery creak. The whisper component is assumed here to be produced in a triangular gap between the arytenoid cartilages, in all three compound phonations, and creak made separately at the thyroid end of the glottis, with both modal voice and falsetto being limited to the ligamental section of the glottis. The triple compound types whispery creaky voice and whispery creaky falsetto would be further instances.

Examples of the second condition for compatibility, where two (or more) vibratory patterns modify each other, would be all the instances of harshness in compound phonations. The modification that harshness imposes on all compounds in which it participates is a boost in the parametric values of adductive tension and medial compression to an extreme degree. In this way, the compound type harsh voice is characterized by greater adductive tension and medial compression than the moderate values normally found in modal voice alone. In order to achieve phonation, the sub-glottal pressure has to be given a compensatory boost also, in order to re-assert aerodynamic participation in an aerodynamic-myoelastic phonatory equation in which the myoelastic component has had the elastic resistance of the glottis to airflow substantially increase. ¶UKT

This does not mean that phonation types with low values on these parameters cannot combine with harshness. Whisper is a case in point, where low adductive tension is normally one of its characteristics: it does mean though the whisper component in harsh whispery voice, say, is maintained by a much greater effort on the part of the lateral cricoarytenoid muscles to keep the arytenoid triangle open against the vigorous attempt by the interarytenoid muscles to close it. The auditory nature of the whisper component is likely to be rather different in such compound phonation, compared with its occurrence as a simple type. ¶UKT

There are two general conditions, as suggested in Chapter 01, under which compatibility is not possible between individual phonation types, preventing the occurrence of particular compound phonations.

The first of these conditions [{physiological condition}] is where the pre-requisite actions of the larynx for each of the two types of phonation involved are mutually exclusive.

The second condition [UKT: auditory condition] is where perceptual factors make it impossible to hear the differences introduced by the addition of one phonation type to the other.

There are a number of examples of the first, physiologically preemptive condition. Modal voice and falsetto are one instance. Their membership of the first grouping of phonation types is based partly on this impossibility of co-occurrence. These two types of phonation need quite different types of vibration of the vocal folds, as described earlier in this chapter, and they therefore cannot combine. ¶UKT

Similarly, we have seen that harshness and breathiness are mutually incompatible, because of their parametric prerequisites. Where harshness has extremely high adductive tension and medial compression, breathiness must have very low values of these parameters. ¶UKT

Breathiness is a very unusual aspect of phonation in this respect, and is incompatible with  almost every other phonation type. With very low adductive tension, and most importantly, very low medial compression, breathiness is compatible with only one other phonation type -- modal voice, giving breathy voice, as discussed above. Modal voice has a moderate degree of medial compression, while every other type, falsetto, whisper, creak and harsh, has a high or very high degree. Also, modal voice has only moderate adductive tension, where falsetto, creak and harshness have very high degrees. It is true that whisper is similar to breathiness in having low adductive tension, but whisperiness and breathiness have been defined here as complementary actions of the same scale, so that their potential combination is excluded by definition.

Another example, which is covered by both incompatibility conditions, physiological and auditory, is the combination of harshness and whisper, preventing the occurrence of hash whisper as a descriptive category. The physiological incompatibility is on a different basis than that of modal voice and falsetto, or of harshness and breathiness. It is not a matter of directly opposite physiological requirements, but redundancy. The effect of adding the actions which produce harshness to those producing whisper is merely to boost the tensions pressing the vocal folds and the arytenoid cartilages together. Regardless of whether the type of whisper is one where a gap is left between the arytenoids, or one where the ligamental glottis is also kept slightly open, the effect of adding harshness will be to narrow the glottal aperture. This will only result in the audible whisper rising in amplitude until the gap is completely closed. While the whisper last, it will be heard as being louder, without a major change in quality.

The second, auditory, incompatibility condition applies not only to harshness and whisper, but also to modal voice and falsetto, and possibly to harshness and creak.

Harshness and whisper are both characterized acoustically by a factor of aperiodicity. To add the aperiodicity of harshness to that of whisper would, in that particular respect, be auditorily redundant. This is not to say that harshness and whisper do not combine; harsh whispery voice is a common compound phonation. But their interaction is primarily with the voice component, as it were, rather than with each other in the particular respect of aperiodicity.

The auditory incompatibility of modal voice and falsetto is straightforward: using the same vocal apparatus, they constitute different qualities. Harshness and creak present a less clear-cut case of auditory incompatibility. It will be recalled that creak is sometimes characterized by a certain of moment-to-moment variability of its (normally very low) fundamental frequency. To modulate this variability by superimposing the essential variability (aperiodicity) of harshness would not produce a large change in auditory quality. However, given that variability of fundamental frequency in creak is not necessary ingredient, but only an occasional characteristic, then to superimpose a continuous aperiodic factor on creak would produce a compound phonation that should logically be called harsh creak.

It is worth emphasizing once again, at this point, that these suggestions about compatibility are based on hypothesized definitions. The empirical incompatibilities that are suggested rest on the validity of the physiological and acoustic hypotheses. While the suggested principles of compatibility and incompatibility are able to give structure to most of the combinations of phonation types that can and cannot occur, they can only be applied where the physiological mechanisms and the auditory effects are reasonably well understood. There are some compound types of phonation where the necessary degree of analytic clarity does not yet obtain. Comments about these can therefore  only be rather speculative.

For example, the physiological mechanism for the production of creak remains unclear. It has been treated in this description as if it were the product of an independent vibratory system at the forward, thyroid end of the glottis, following suggestions by a number of writers. The auditory impression of the creak component in compound phonations is sufficiently different in the various compound types to suggest that the auditorily identified phenomena we are willing to call 'creak' may possibly be produce by different mechanism in different compounds. The creak component in high-pitched creaky falsetto sounds different from that in whispery creak, for example.

The creak component in creaky voice, which is a common compound phonation, may well differ from speaker to speaker. One suggestion for the creak mechanism in creaky voice is that it is made, not at the thyroid end of the glottis, but at the arytenoid end: Abercrombie (1967: 101) describes the phonation in creaky voice as one 'in which the cartilage glottis is vibrating very slowly, while the rest of the glottis is in normal vibration'.

UKT: The thyroid end and the arytenoid ends of the glottis:
The glottal opening in figures such as the one on the right shows a slit that can be opened. It runs from the front of the neck (Adam's apple) to the back. The statement "not at the thyroid end of the glottis, but at the arytenoid end" shows that the glottis "slit" runs from the thyroid end to the arytenoid end.

A somewhat similar comment is made by Ladefoged in his monograph on West African languages, where he discusses 'laryngealized voice' (which he equates with 'creaky voice'). He writes that 'In this state of the glottis, there is a great deal of tension in the intrinsic laryngeal musculature, and the vocal cords no longer vibrate as a whole. The ligamental and arytenoid parts of the vocal cords vibrate separately' (Ladefoged 1964: 16). Presumably the creak component in this description would be attributable to the arytenoid location, rather than to the ligamental section.

In the case of the triple compound phonation type whispery creaky voice referred to above, it seems more likely that the creak component would be produced at the extreme thyroid end of the glottis, leaving the cartilaginous glottis free for the production of the whisper and the rest of the ligamental glottis for the voice component -- but this remains a speculation for the present.

There are also physiologically possible phonation types quite outside the descriptive system presented here, omitted because they seem never to be used in normal speech, whose possibilities of occurrence in compound phonations are not yet well analysed. To give one example, it is possible to produce phonation which sounds like ventricular falsetto (sometimes referred to as 'seal voice'), by very severe compressive effort of the whole larynx, and extreme pulmonic effort. There are also a number of auditorily different kinds of whisper, touched on by Catford (1964: 32-3). A further, very high-pitched phonation type is mentioned by Hollien (1974: 127) 'usually referred to as the "flute", "whistle" or "pipe" register ... exhibited by a few women and children'.

Two compound phonation types in particular seem to be exploited for linguistic purposes. The are creaky voice, discussed earlier, and whispery voice. Ladefoged's term for whispery voice is 'murmur', and he describes murmur as figuring in phonological opposition in many Indo-Aryan languages, such as Hindi, Sindhi, Marathi, Bengali, Assamese, Gujarati, Bihari, and Marwari: he also comments that murmured consonants are common in Southern Bantu languages, such as Shona, Tsonga, Ndebele and Zulu (Ladefoged 1971: 12-14)

Paralinguistically, whispery voice is used in English, and in very many other cultures, as an indicator of confidentiality. It is quite distinct in this function from the more intimate use of breathy voice.

End http://www.ling.mq.edu.au/ling/units/sph302/papers/laver_1980_phonation.pdf   download 071024

Contents of this page

UKT note

stridor

stridor n. 1. A harsh, shrill, grating, or creaking sound. 2. Pathology A harsh, high-pitched sound in inhalation or exhalation. [Latin strīdor from strīdēre to make harsh sounds ultimately of imitative origin] -- AHTD
Go back stridor-AHTD-b

Contents of this page

End of TIL file