I conducted an experiment on Amazon Mechanical Turk, where native English speakers from across the U.S. provided their judgement on the accentedness of 1oo nonnative speech samples. As my aim was to investigate the phonetic features of foreign accents, sociolinguistic aspects of accentedness perception did not make the cut in my paper. However, since I already collected raters’ demographic information, it would be a shame not to do anything about it.

Here, I plotted a choropleth map in r with the ggplot2 package. Mean accentedness ratings were calculated for raters from each state. The higher the number, the more accented the 100 speech sample sounded to the raters. I should note that raters in my experiment made their judgment on a likert scale. The range was 1 to 9, where 1 means “no foreign accent at all”, and 9 means “very heavy accent”.

Here’s the figure I made with ggplot2. The darker the color, the more accented the speech samples sounded to the raters. One can observe that the same 100 speech samples were more accented to people living in Nevada than to people living in Illinois. Of course, the sample size for each state is not the same. One should not draw any conclusions based on the figure below. The purpose of this post is purely to show how the figure was made.

#load library “maps” ##step 1: attach geographic coordinates


#load data “state” which has geographic coordinates of each sate


#load your own data that consists demographic inforamtion of your participants


#calculate the mean ratings for each “region”, in my data, “region” refers to the states where the participants current reside.


#merge the coordinates with my data, by “region” (i.e. states)

total$region<-factor(total$region) #make sure the region variable is categorical
total <- total[order(total$order),] #order data by the "order" variable[important!]

#you can now draw the graph

p + 
geom_polygon(data=total, aes(x=long, y=lat, group = group, fill=total$rating),colour="black")+ 

#But I want to add state abbreviations to the figure
#and I want darker color to correspond to “higher accentedness ratings”
#find the central location, where the abbreviations should appear

centroids <- setNames(do.call("rbind.data.frame", by(total, total$group, function(x) {Polygon(x[c('long', 'lat')])@labpt})), c('long', 'lat')) 
centroids$label <- total$region[match(rownames(centroids), total$group)]

#select only one label for each state (you might want to do it manually, rather than using the following codes)


#in my data, I used the full names for each state (e.g. “california”, “florida” etc.,)
# the following codes turn full names into the two-letter abbreviations


#using annotate(), we attach the abbreviations to the figure
#the theme functions hide labels, tick, and changes fonts.
#check ggplot2 theme function for details.

ggplot(total, aes(long, lat, group=group, fill=rating)) +
  geom_polygon(colour = "grey") +
  scale_fill_continuous(low = "cornflowerblue",high = "darkblue",guide=guide_colorbar(barwidth = 2,barheight = 10))+
  with(try2, annotate(geom="text", x = long, y=lat, label = abb, size = 6,color="white"))+
  labs(fill = "Mean Ratings")+
  labs(title = "Mean Accentedness Ratings by State")+
  theme(panel.background = element_rect(fill = "grey"),
        plot.background = element_rect(fill = "grey"),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank(),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank())+
  scale_y_continuous(breaks=c()) + 
  scale_x_continuous(breaks=c()) + 
  theme(panel.border =  element_blank())+
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    axis.ticks = element_blank())+
  theme(plot.title = element_text(size = 30, family = "Times",colour = "white"),
               legend.title= element_text(hjust = 0.4 ,vjust=0.3, size=20,family = "Times"),
               legend.text = element_text(hjust = 0.4 ,vjust=2, size=20,family = "Times")

#here’s the figure I got

you may download the r codes from my github page