language modelling María Fernández Pajares
Verarbeitung gesprochener Sprache
Bigrams are easily incorporated in Viterbi search
Trigrams used for large vocabulary recognition in mid-1970’s and remain the dominant language modeL
IBM TRIGRAM EXAMPLE:
Methods, in order to measure the probability of ungesehenen N-grams:
n-gram performance can be improved by clustering words
– Hard clustering puts a word into a single cluster
– Soft clustering allows a word to belong to multiple clusters
• Clusters can be created manually, or automatically
– Manually created clusters have worked well for small domains
– Automatic clusters have been created bottom-up or top-down
PERPLEXITY
Average of options
Quantifying LM Complexity
• One LM is better than another if it can predict an n word test corpus W with a higher probability
• For LMs representable by the chain rule, comparisons are usually based on the average per word logprob, LP
• A more intuitive representation of LP is the perplexity
(a uniform LM will have PP equal to vocabulary size)
• PP is often interpreted as an average branching factor
Perplexity Examples
Bibliography:
P. Brown et al., Class-based n-gram models of natural language, Computational Linguistics, 1992.
• R. Lau, Adaptive Statistical Language Modelling, S.M. Thesis, MIT, 1994.
• M. McCandless, Automatic Acquisition of Language Models for Speech Recognition, S.M. Thesis, MIT, 1994.
L.R.Rabiner y B.-H.Juang:”Fundamentals of Speech Recognition”,Prentice-Hall,1993
GOOGLE
Comments