Mulak, K. E., Best, C. T., Tyler, M. D., Kitamura, C., & Irwin, J. R. (2013). Development of Phonological Constancy: 19‐Month‐Olds, but Not 15‐Month‐Olds, Identify Words in a Non‐Native Regional Accent. Child development,84(6), 2064-2078.

Mugitani, R., Pons, F., Fais, L., Dietrich, C., Werker, J. F., & Amano, S. (2009). Perception of vowel length by Japanese-and English-learning infants.Developmental psychology, 45(1), 236

Mulak et al (2013) quite convincingly demonstrated that kids with more language exposure are more likely to map nonnative sounds into native sound categories, a strategy linguists sometimes call “the recoverability principle”(Weinberger, 1987), which can be explained within the frameworks of statistical learning and information theory. The basic idea is that the mispronunciation of a low-functional-load (low entropy) sound is tolerable, because interlocutors can recover the intended sound from the linguistic context. For example, my neighbor’s 5-year-old daughter always says “jama” instead of “pajama”. The dropping of “pa” is understandable because 1) “ja” is unstressed, and therefore is perceptually less salient; 2) the likelihood for “pa” to occur before “jama” is very high, and is therefore very likely to be recovered by the interlocutor.  One would predict that the mispronunciation of “pajama” as “jama” is more tolerable than the mispronunciation of “sheet” as “shut” or “shoot”, or, heaven forbid, “shit”. The reason why mispronouncing the vowel in “sheet” is less tolerable is because too many vowels can occur in environment “sh_t”.  Therefore, I think children are more likely to receive corrective feedbacks from their parents for mispronouncing elements with a higher functional load than the ones with a lower functional load. If language acquisition is indeed statistical, then babies will be more likely to mispronounce sounds that have a lower functional load than sounds with a higher functional load.

In their general discussion, Mulak et al (2013) argue that children might treat consonants and vowels differently because consonants and vowels have different function and characteristics. I would argue, based on the recoverability principle, that phonological constancy depends on the functional load of a given element, rather than the type of the element. For example, the “v” in “vaby” is very likely to be mapped to “b” because the likelihood for “b” to exist in environment “aby” is very high, while the likelihood for “v” to occur in environment “aby” is probably 0.

Compared to Mulak et al.(2013), Mugitani et al. ‘s (2009) findings are perhaps less conclusive. First, the intention of the artificial language paradigm is to control for lexical influence. Unfortunately, the artificial word “taku” actually exists in Japanese. “Taku” can be a person’s name, which is derived from the word for “table”. “Taku” can also be a verb suffix, which functions as something similar to an auxiliary verb “would” (e.g. sugari-taku, “would like to”). I believe Mugitani et al. (2009) had constructed more than one artificial word. Hopefully, other words do not exist in either Japanese or English. Second, vowel length contrast does exist in English. For example, the voicing difference at the end of “bat” and “bad” is often absent, because voiced coda consonants tend to be devoiced or even dropped in connected speech (Tagliamonte & Temple, 2005). For example, “bad man” and “batman” could have the same phonological structure in connected speech. Vowel length, thus, emerges as a major cue that differentiates “bat” from “bad”. Specifically, the vowel in “bat” is shorter than the vowel in “bad” (Ladefoged & Johnson, 2014).

Despite my disagreement with the authors’ linguistic claims, I do agree that English 18-month-olds do not possess enough phonemic awareness on vowel length contrasts. I think it is also reasonable to assume that the symmetric discrimination performed by the English infants is based on acoustic perception, rather than phonemic awareness. The novel finding in Mugitani et al (2009) is that Japanese 18-month-olds are on the transitional phase between phonetic perception and phonemic perception. The evidence is that Japanese adults’ symmetrical discrimination implies phonemic perception. I think it would be much more convincing if they could have run the experiment with English-speaking adults. If perception is indeed categorical and vowel length is not phonemic in English, English-speaking adults will probably fail to discriminate vowel length difference. Or, maybe perception is episodic, in which case English-speaking adults will have no problem discriminating vowel length difference.

A growing body of research has shown that speech perception is episodic, rather than categorical. For example, Goldinger (1998) shows that listeners perceive speech faster and more accurately when the speech is repeated in the same acoustic characteristics.  Nielsen (2011) shows that English speakers can perceive the durational change of voice onset time (VOT), which is just the opposite of the VOT example given in Chapter 3 of our textbook. Given the possibility of episodic perception, the symmetrical discrimination performed by Japanese adults does not necessarily entail phonemic perception.

To explain the asymmetrical discrimination performed by the Japanese 18-month-olds, a control group is perhaps needed to see if this asymmetrical discrimination is universal to infants at a certain stage of language development, or if it is specific to Japanese 18-month-olds. I find it interesting that this study used age as a predictor for language development. Maybe “length of morpheme” is a better choice.


Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105(2), 251.

Ladefoged, P., & Johnson, K. (2014). A course in phonetics. Nelson Education.

Nielsen, K. (2011). Specificity and abstractness of VOT imitation. Journal of Phonetics, 39(2), 132–142.

Tagliamonte, S., & Temple, R. (2005). New perspectives on an ol’variable:(t, d) in British English. Language Variation and Change, 17(03), 281–302.

Weinberger, S. (1987). The influence of linguistic context on syllable simplification. Interlanguage Phonology: The Acquisition of a Second Language Sound System, 401–417.