Hayes, B., & Wilson, C. (2008). A maximum entropy model of phonotactics and phonotactic learning. Linguistic inquiry, 39(3), 379-440.
How do we account for the acceptability (well-formedness) of non-existing words?
e.g. why blick is more well-formed than bnick?
Acquisition model (Chomsky & Halle)
This paper propose a model that describes AM
Goal of the Learner
- Inductive Baseline
Definition of constraints is based on theoretical concepts (underspecification)
Linear : Feature-bundles: [ +consonantal,+approximant] e.g. /r/,/v/
Add more layers later
1) Autosegmental tiers: main tier, vowel tier,
2) Metrical grid: stress tier
The constraints defined here are not the same as what defined in the OT framework
- 3. Gradient phonotactics
Categorical: okay v.s. not okay
Gradient:Okay–>mostly okay–>sometimes okay–>maybe okay–>not okay
- 4. Maxent
Maxent value is probability
Maxent (x) = e to the power of – h(x)
where h(x) is defined as constraint violations
see table 1 in the article
If an output (x) violates a lot of constraints, then the probability for this output to exist is low.
If an output(x) violates important constraints, then the probability for this output to exist is low. (Importance is defined as weight)
In other words,
If under a set of constraints, an output(x) has a lower probability to occur, then x is ill-formed.
If under a set of constraints, an output(x) has a higher probability to occur, then x is well-formed.
Therefore, well-formedness is defined as maximum probability (i.e. maximum entropy value)
Question:1. Why not use “maximum probability”?
Maxent, although defined as probability, does not mean frequency or occurrence.
e.g. both “blick” and “bnick” have 0 occurrence, but one has a higher Maxent than the other.
The value of Maxent depends on the value of h(x).
The value of h(x) depends on 1) how many times a constraint is violated and
2)how important(weight) this constraint is.
Therefore, to calculate maxent, we need to know
1) what the constraints are
2) the weight of each constraint.
- Weight: importance
Iterated hill-climbing search, based on observed constraint violations and expected constraint violations
- Searching the space of possible constraints
Question: What is the selection problem? P.390
Space: natural classes determined by UG features
Number of constraints determined by the number of natural classes and features
- search for shorter constraints,
- same length: search for more general featural expressions
Input: a set of segments classified by sets of features. Every class is a constraint.
- calculate the accuracy of each constraint
- divide constraints by accuracy level
- select the most general constraints within an accuracy level, train for its weight (maximize probability)
- English onset
Download files here: http://www.linguistics.ucla.edu/people/hayes/Phonotactics/
Example demo with UCLA Phonotactic Leaner
Question: What are the constraints under this scheme?
Results generally confirms findings in Scholes (1966)
Better than other alternatives
- Nonlocal phonotactics
Pure linear model doesn’t workàAdd projection
Question: what is this projection?
I think it is just another layer. That is, there is a layer for the whole word (e.g. mVmV)
and a layer for the vowels (e.g. V..V)
- Metrical Grid & 4. Whole language analysis
Differences between maxent approach and OT framework
- constraints are not universal
- constraints are weighted, not ranked.
Question: can we say a constraint with a higher weight ranks higher?
- there’s no input-output relationship
- a well-established mathematical foundation (i.e. the maxent model)
- it is flexible and sensitive to the range of frequencies in the learning data
e.g. OT framework has trouble dealing with rare occur onsets such as /pw/. “Puerto Rico” will cause one to assign a lower ranking to [no labial+w] under the OT framework. However [no labial+w] is correct for almost all other English onset consonant clusters, which means [no labial+w] should rank higher. To account for the rare occurrence of /pw/, a weight based system is thus better.
- hidden structure
- accuracy and generality