IPA Letters & Sounds they Describe

The Vocal Tract

The "Vocal Tract" is the area of the human anatomy responsible for making the sounds of speech. It consists of the lips at the opening of the mouth, and continues downwards to the vocal chords. The lungs provide the power for the process, and are also necessary to speech, though no sound is formed there. Starting at the bottom with the Larynx or vocal chords, the parts of the mouth and tongue that cause sounds to be formed are the glottis at the back and bottom of the throat, then the epiglottis, just above it, the pharynx, the velum, the corona (or roof of the mouth), the alveolar ridge, the teeth and the lips. The IPA refers to sounds according to where they are produced, using these descriptors of the Vocal Tract as mileposts.

Vocal Tract

The Letters of the IPA and the Sounds they Describe

The 107 letters fall into two categories: consonants and vowels, following the conventional definition of those terms. Consonants are further subdivided into pulmonic and non-pulmonic. Pulmonic consonants are generated by expelling air from the lungs or diaphragm past the vocal chords and out the mouth or nose. Each letter has not only a graphic representation, but also a numerical code, to avoid confusion and to aid in the publication of manuscripts, using Unicode lettersets.

Pulmonic Consonants

Most of the consonants used by humans are pulmonic, and all English consonants are. They are classified according to how they are articulated, and where they are articulated. Most consonants are sonorant - that is, voiced, indicating the vocal chords have come into play. Others, called obstruent, are unvoiced, and involve just a specific movement or formation in the lips, mouth and throat, and the passage of air.

The IPA Pulmonic Consonant chart displays all of the possibilities. There are four main groups (even though the IPA chart omits them at the top). Each of the four groups addresses a portion of the Voice Tract, starting at the front and going towards the back. Each main group has three sub-areas of precise sound location. They are:

Labials (in the very front), comprised of "labial," "labiodental" and
Coronals (in the roof of the mouth around where you burn yourself if you
eat a piece of pizza that is too hot). The subgroups are "alveolar," "post-alveolar"
and "retroflex."
Dorsals (to the rear of the roof of the mouth), comprised of "palatal,"
"velar" and "uvular."
Radicals (down to the "root" of the tongue and in the very back of the
throat), comprised of "pharyngeal" and "glottal." (The glottal category
used to be subdivided in to epiglottal and glottal, but now they are combined).

In each of these four positions - in theory at least - it is possible to have different manners of making sound. There are eight different methods, which are:

Plosives, formed by a sudden escape of air (like "p" in "pow!";
Nasals, formed by passing the sound up through the corona into the nasal cavity (like "ng" in "sing");
Trills, created by vibrating the tip of the tongue or the lips as air is pressed outward (like "Brrr" when punctuated with the lips);
Tap or Flap (regarded as synonymous by most linguists), which is a single, rapid touch of the tongue to another portion of the mouth (like "d" in "done");
Fricative, formed by the rubbing of two portions of the Voice Tract against each other (like all consonants in "Josh");
Lateral Fricative, which is a combination of a liquid with the forcing of air outward to cause a fricative sound at the sides of the tongue (not heard in English);
Approximant, which is the formation of a letter that is voiced without requiring that there be a physical closure of two parts of the Voice Tract on each other, but that they close in on each other (like the American Midwestern "r"); and
Lateral Approximant, which is a special case of a liquid consonant being pronounced without ultimately having the tongue touch the roof of the mouth. (Example: "milk" when pronounced almost like "miok"). The sounds in each of the four special categories (from front to back) can be summarized as follows:

Labials are made in the front of the mouth, with the teeth and lips. They include bilabials (like ), labiodentals (like "f") and dentals (like an emphatic "d" as in "duh!"). Labials are produced as nasals (like a humming, as in the case of "m"), as plosives (as in the example of , as fricatives ("f" or "th"), approximants (more open than fricatives, and nearly vowels, like "w"), trills (involving a repetitive lip movement, as in "Brrrr"), and flaps or taps (a single tongue touch on the teeth as the "t" in the word "Saturday" when enunciated in the very front of the mouth).Coronal consonants are made either with the alveolar ridge (on the gums, just behind the teeth, or a bit higher towards the top of the mouth (post-alveolar) or in the dome of the roof of the mouth (called retroflex). The normal "n" is a nasal alveolar consonant, and the "ng" is a nasal retroflex consonant. The plosives are and (voiced and unvoiced), and are normally alveolar consonants, though they may be pronounced farther back in the mouth in combination with other consonants (for example, the "t" in "snort" occurs closer to the top of the mouth than the alveolar ridge). The unvoiced alveolar fricative is "s" and voiced, it is "z." Post-alveolar fricatives are the "sh" sound, voiced (as in "fusion") or unvoiced (as in "fission"). Retroflex fricatives include the "ch" sound as in the German word "Sicht" when pronounced according to the softer manner of Germans in the South). The voiced version sounds like a hard initial "h" (as in "hue" or the hard "j" sound found in some Spanish dialects - as in "José."). Coronal approximants also come in different varieties, the distinctive American "r" as in "red" being a good example, produced in the retroflex part of the throat, just at or a little behind the top of the mouth. The only real trill Americans do is a "r" when it is vibrated with the tip of the tongue against the post-alveolar portion of the mouth. It is not needed for speech, but is sometimes produced in its unvoiced form to mimic the purring of a cat. The alveolar tap in English would be the "t" as in "Saturday" if the word were pronounced farther back in the mouth. Post-alveolar and retroflex taps are rare, and not found at all English.

Unlike the labial group, the coronal group must account for another category of sounds, called laterals. They come in two flavors: fricative and approximant. Laterals are liquid consonants. In English, ("ell") is the only pure example. It is pronounced either clearly, with the tongue pressing laterally outward, and air passing over top (making it a fricative) as in "bleak," or somewhat more darkly as an alveolar approximant, as in "control." Languages like Tibetan and Navajo provide examples of other lateral sounds, not required or encountered in English.

Dorsal consonants are palatal, velar or uvular, depending on how they are sounded. The palate is roughly the upper back of the mouth, the velar portion is below that, and the uvular portion just below that. If one says "ick" the "k" comes off the palate. If one says "gawsh" with a dropped jaw, the "g" comes off the velar portion of the back of the mouth. If one says "haw" with a yet deeper intonation, the sound comes from the uvula. Nasals include the "ni" sound as in "onion," a palatal nasal, the "ng" in "sing," a velar nasal, .and the "ng," when pronounced in the word "twang" a la a country music star, is a uvular nasal. The palatal plosive is not found in English, but the velar plosive is the hard "g" as ins "gosh." The guttural "ch" of the Northern German accents, as in "Pflicht" is a uvular fricative. The voiced uvular fricative is most commonly thought of as the Gallic "r," also commonly used for initial "r" or internal "rr" in Portuguese. The palatal approximant is the semi-closed "y" as in "yawl" or the German "j" in "Jahr." The velar approximant is a somewhat breathy version of the velar "g" without quite closing the tongue to the back of the mouth. There is one uvular trill, which is the Gallic rumbled in the back of the throat for a prolonged moment. Rounding out the dorsal consonants id the velar lateral approximant, which is the sound of "l" as in "milk," It is more like a tap if the tongue touches the velar portion of the mouth, but is more likely to be an approximant. Usually it will just head in that direction without arriving until after the sound is over.

Radical consonants are the fourth main group of Pulmonic consonants. The two subgroups are Pharyngeal and Glottal, depending upon where in the back of the mouth (or throat) the sound is produced. Many of the methods of making sounds just do not apply to this group because they require other parts of the Voice Tract. In the Pharyngeal subdivision, there are only fricatives. They are only approximated by various combinations like kh, gh, or ck and not uttered in English. The glottal category has one plosive - the "uh-oh" sound. The voiced glottal fricative is the deep throated, glottal (a more intense version of the sound made by the Gallic ). Unvoiced it is a somewhat exaggerated "jota" as pronounced in Spain, or the Prussian "ch" as in "Sprache."In the summary table below, blanks indicate that the articulation is impossible, or that no known instance of it exists. If two symbols are in the same cell, the first is the unvoiced version, and the second is the voiced version. Also, though position of the tongue is used in the columns of the table, sometimes its shape - more than its placement -- will cause the sound to be made as it should. This applies to the fricatives in the coronal group.

Pulmonic Consonants


Pulmonic Consonants may involve the pronunciation of sounds from two different places at the same time. This is called "Coarticulation." The English "w" must be formed by forming the lips into a round shape and simultaneously contracting the muscles in the back of the throat and tongue to move the tongue upwards and then push voiced air through the occluded passage. For this reason a few more symbols are required to handle instances of coarticulation. They are:


Double Articulation of Affricates

An affricate is a consonant that starts off as a stop, like a "t" sound, but then becomes a fricative, like "ch," as in "church." Common in English is the "d" to "ge" sound, as in "judge," in which the initial consonant is the same sound as the two consonants together. The sound "pf" in German is an example of a voiceless labial affricate, as in the word "Pferd". By far the most common affricates are alveolar (voiced and unvoiced) as in the "ch" sound for "Tchaikovsky" or the "jǝ" sound of "judge." Moving farther back are the post-alveolar and the alveolar-palatals. Often these combinations are shown as double characters, connected by a tie connecting them at the bottom or over the top, depending on font availability. Some Unicode fonts show them as ligatures. An alternative usage is to make the second sound a superscript. In the table below, the two forms are shown, with voiced on the left and unvoiced on the right, starting with alveolar, and moving down to post-alveolar and alveolar-palatal. The pronuncn of these affricates is provided in the IPA phonetic pronunciation table for consonants, above.

Affricate Ligatures

Non-Pulmonic Consonants

Consonants produced without any reliance upon air from the lungs are called "non-pulmonic." The most common examples are the African "click" languages (Khoisan). The Fulani of West Africa employ a post-alveolar cluck from time to time by pressing the tip of the tongue against the post-alveolar corona and releasing it with an unvoiced sound.

Other examples are "implosives," which are found among several African languages (including Swahili), some Asian tongues, and among a few indigenous peoples of the Americas. Mayan language had implosive consonants. The main feature of implosive consonants is that the flow of breath through the Vocal Tract is accomplished by pulling the glottis back, causing air to come inward and down the Vocal Tract. The closest approximation to a voiced implosive would be the sound made when one is surprised - a rapid breath intake with something like an "h" sound from the velar portion of the mouth.

Non-Pulmonic Consonants


Vowels are defined as vocalized sounds occurring in the center of a syllable. They are classified according to where the tongue is inside the Vocal Tract when the syllable is uttered. The "height" of a vowel refers to the position of the tongue (from low to high). For example "ah" as in "calm" is a low vowel, with the tongue resting on the floor of the mouth. The "ee" as in "see" is a high vowel, as the tongue goes up towards the corona (without creating friction). Vowels are also classified between front and back. The "e" in "fret," for example, causes the tongue to be forward, whereas the "uh" sound in "love" is far to the back.

Languages and dialects have so many vowel variations that linguists use a system of reference vowel sounds, developed around 100 years ago by Daniel Jones. The cardinal vowels are the mileposts for describing vowel sounds. The highest and most forward vowel is the "ee" just referred to, and the lowest and most backward one is the [ɑ] or "ah" sound, with the jaw dropping and the lips relaxed. The back and high boundary is set by [u], which is made in the back of the mouth as one is about to whistle or blow on something. These three corners establish a shape in which all the other vowels will fall. Other "mileposts" are defined for the intermediate positions. For front and back there are "near front," "central" and "near back" as intermediate spots. For height, the positions are named Close and Open for the extremes and Mid for the middle. In between Close and Mid are two spots: Near Close and Mid Close. Likewise, after Mid come Mid Open and Near Open on the way to the other end of the scale. This creates in theory 35 intersections in the grid; however, many intersections are not used.

The eight primary or cardinal vowels are really the anchors for this table, even though not all eight are found in every language. In fact, the Ngwe language of West Africa is cited as perhaps the only known language in which all 8 cardinal vowels appear. The eight primary or cardinal vowels are the four back vowels in Close, Close Mid, Open Mid and Open positions, along with their four front counterparts at the same heights.

A "rounded vowel" involves pursing the lips as it is pronounced. It can be rounded inward or outward. An "oo" would be rounded inward, and an "ü" in German would be made with the lips rounded outward. A vowel that is not rounded is pronounced with the lips in open, flat position.


Diacritical Marks

Diacritics used within the IPA to indicate more specific descriptions of sounds or minor alterations. "Sub-diacritics" are marks placed at the bottom of a symbol or letter, unless the letter already has a "tail." The "tail" is correctly known as a "descender," but is often just called a "hook." In that instance the sub-diacritic is moved to hover over the letter rather than under it. When [i] has a diacritic above it, the dot goes away.

Some diacritics are really superscripts, indicating phonetic details, like a fricative release, as in "tch" (shown as [ts]) or the aspirated release of "ick" (shown as [kh]).

The IPA sometimes offers linguists an option in how to express certain sounds, whether as letters and letter combinations, or as letters with diacritics. For example, one might reserve the diacritic indicated for the breathy voice for sonorants only, applying the aspiration indicators exclusively with obstruents.

Many languages (but not English) generate sounds with the glottis, in the back of the throat. Diacritical marks help distinguish among different shades of sound.


[t] voiceless [d̤] breathy voice (murmured)
[̥] relaxed voice [d] modal voice
[d̬] stiff voice [d̰] creaky voice
[ʔ͡t] glottal closure    


Think of suprasegmentals as features that apply to more than just a single sound. They can also apply to combinations of sounds, like a syllable, a whole word or even a phrase. Tone, length, stress and prosody are the matters handled by supersegmentals. They speak to the pitch, or intensity or rhythm of the speech. While most of them work at the syllable level, some will function at multi-syllable levels, like polysyllabic words and phrases.


Tones and Accents

Names of the Symbols

Because many of the symbols can not be called simply according to the sound they represent, names have been developed. They are not "official," as the IPA handbook does not officially give out names for the letters and marks. As a general rule, the names of the Greek and Latin letters are used when they are unmodified. Over time, names developed for some of the other symbols, and the Unicode standard developed yet more. Sometimes more than one name will apply. For example the unofficial IPA designation for ɛ is epsilon, but Unicode calls it "small-Latin-open-E." A few symbols are described according to what they look like, as in the case of the backwards gelded question mark [ʕ]. If all else fails, use the Unicode designator or its number. This symbol is called a "Latin Letter Pharyngea" with Unicode 0295.

Traditional diacritics are named traditionally, like "e-acute." In all other cases, the IPA nickname may be descriptive, as in the case of a ʑ (a "Z with curl"). The marks may have names separate from the letters they adorn, but with only a few exceptions, each letter appears with each diacritical, with a unique code. In a case of doubt, pull up a Unicode Chart on the Internet and browse for the Unicode names and numbers of the symbols you want.