Spectrogram reading and other acoustics
Spectrograms; review of core aspects from your readings
>> Vowels have strong formants. (Sonorant consonants like nasals and liquids have weak
formants; obstruents do not have formants in the regular sense, though they may have areas of
strong energy.)
>> We number the formants from the bottom: F1, F2, … . (Note: f0, the fundamental frequency,
is not a formant. Its multiples are the overtones or harmonics, from which the formants are
constructed as the strong areas.)
>> The first formant, F1, inversely correlates with tongue-height: the higher the tongue during
articulation, the lower the first formant.
>> The second formant, F2, correlates with frontness of tongue-position: The further front the
vowel, the higher the second format. [A similar measure proposed by some people is that the
distance between F1 and F2 correlates with front-ness.]
Formant transitions and place of articulation
>> In the transition between a vowel and a labial consonant,
F1, F2, and F3 of the vowel go down.
>> In the transition between a vowel and a velar consonant,
F2 rises and F3 lowers; in other words, F2 and F3 move towards each other.
>> In the transition between a vowel and an alveolar consonant, F2 moves towards a value
in the neighborhood of 1700-1800 Hz (it does not usually go there all the way).
Difference between voiced oral stop, voiceless oral stop, and nasal stop
>> Place of articulation is indicated in the same way for all three stop classes, including the
nasals: by the formant transitions of adjacent vowels (see preceding section).
>> Nasal stops have weak formants around 250, 2500, and 3250 Hz. Oral stops have no
formants and no higher frequencies during closure, except:
>> If an oral stop is voiced during closure, one can normally see the voicing in a wide-band
spectrogram in the form of vertical striations at the bottom of the spectrogram (‘voice bar’). (Ifan oral stop is voiceless during closure, there is normally no energy of any frequency during the
>> Aspiration can be seen in noise at higher frequencies, unstructured. It is weaker for [p] than
for [t, k].
>> Voice onset time, VOT, is an imporant cue for voicing, but it varies by position:
in word-initial position and before stressed syllables, VOT tends to increase
(i.e. there is more aspiration for voiceless sounds and a larger (or less negative)
VOT also for voiced sounds in these positions)
>> [l] has weak formants around 250, 1200, and 2400 Hz. Notice that the formant around 1200
Hz can be used to distinguish [l] from nasals.