I recorded my own speech for analysis.

The stimuli are 35 monosyllabic words in Nantong Chinese, 5 for each lexical tone category.

In praat,

1, I manipulated the duration of each syllable, so that each syllable has a similar length.

2, each syllable was then divided into 300 equally spaced time points.

3, F0 vaules at each time point was extracted for each syllable.

The ssanova function in the gss library was used in R to model F0 contour (with with 95% Bayesian confidence intervals).

Graphs were then generated using ggplot2



I then converted the y-axis to a 5-point scale:


I’ve just learned about the Smoothing Spline ANOVA Models (SS Anova), which nicely models pitch contours. I used this method to model the lexical tones in my native language (Nantong Chinese). There are a lot of things I am not quite so sure about. but the ribbons generated in R do nicely show the tone contours.

Some references:

an introduction: http://www.ling.upenn.edu/~joseff/papers/fruehwald_ssanova.pdf

application on native and non-native English speech:


