What is a language model?
It´s a language structure defining method, in order to limit the most probable linguistic units sequences.
They tend to be useful for aplications which show a complex syntax and/or semantic.
A good ML should only accept( with a high probability) right sentences and reject (or give a low probability) to wrong word sequences.
CLASSIC MODELS:
- N-gramms
- Stochastic Grammars.
Introduction: general scheme of a system
signal measurement of parameters comparison of models Rule of decision Acustic and grammar models text
Introduction: task´s difficulty measurement
Determined by the admited language`s real flexibility
Perplexity: average of options
There are finer measures that take into account the difficulty of the words or the acustics models
Speech recognizers seek the word sequence W which is most likely to be produced from acoustic evidence A
Speech recognition involves acoustic processing, acoustic modelling, language modelling, and search
Language models (LMs) assign a probability estimate P(W ) to word sequences W = {w1,...,wn} subject to
Language models help guide and constrain the search among alternative word hypotheses during recognition
Huge vocabularies: integration of the acoustic models and of the language in a hidden macro-model in the Markov to all the language.
Introduction: problems dificulty dimensions
conectivity speakers Vocabulary and language complexity (+noise, robustness)
Introduction: MODELS BASED IN GRAMMARS * They represent language restrictions in a natural way
*They allow the modelling of dependencies as long as required
*the definition of these models involves a big difficulty for tasks that entail languages next to natural languages (pseudo-natural)
*Integration with the acustic model isn´t very natural
Introduction: Kinds of grammars
If we take the following grammar G=(N,S,P,S)
Chomsky hierarchy
0. No restrictions in the rules too complex to be useful
1 Sensible rules to the context too complex
2 Independent of the contextthey are used in experimental systems
3 regulars or Finite state
Grammars and automat
Every kind of grammar is relationed with a kind of automat, that recognizes it:
Kind 0 (without restrictions): Turing Machine
Kind 1(free of context): lineal limited automat
Kind 2 (sensibles to the context):push-down automat
Kind 3 (regulars): finite state automat
Regular grammars
Languages
Generated by
Regular Grammars Regular
Languages A regular grammar is any
right-linear or left-linear grammar
Examples:
Regular grammars generate regular languages
space search
An example:
Grammars and stochastics languages
Add a probability to each of the production rules
A stochastics grammar is a couple (G,p)
Where G is a grammar and p is a function p:P[0,1] that has the property
Where represents a set of grammar rules who´s antecedent is A.
A stochastic language over an alphabet is a pair that fulfill the following conditions:
example
P(W) can be broken down like:
When n=2 bigrams
When n=3trigrams
N-gramms models
Example: Let us suppose that the result of an acoustic decoding assigns to resemblances probabilities to the phrases:
If:
* P(pig | the)=P(big | the) then the election of one or another depends of the word dog.
* P(the pig dog)=P(the). P(pig | the). P(dog | the pig)
* P(the big dog)=P(the). P(big | the). P(dog | the big)
as P(dog | the big)> P(dog | the pig) the model helps to decode the sentences correctly
Problems:
Necessity of elevating number of learning samples:
unigram:
bigram:
trigram :
Advantages:
• Probabilities are based on data
• Parameters determined automatically from corpora
• Incorporate local syntax, semantics, and pragmatics
• Many languages have a strong tendency toward standard word order and are thus substantially local
• Relatively easy to integrate into forward search methods such as Viterbi (bigram) or A∗
Disadvantages:
• Unable to incorporate long-distance constraints
• Not well suited for flexible word order languages
• Cannot easily accommodate
– New vocabulary items
– Alternative domains
– Dynamic changes (e.g., discourse)
• Not as good as humans at tasks of
– Identifying and correcting recognizer errors
– Predicting following words (or letters)
• Do not capture meaning for speech understanding
Estimation of the Probabilities
We go to you suppose that the model of N-gramms has been modelized with a finite automat:
Unigram: bigram w1w2: trigram w1w2w3:
Let us suppose that they we have a sample of training, on which has considered a model of N-gramms, represented like a finite automat.
A state of the automat is q, and is c (q) is total number of events (N-gramas) observed in the sample when model is in state q.
C(w|q) is the number of times that the word w has been observed in the sample,being the model in the state q.
P(w|q) is the probability of observation of the word w conditioned to the state q.
The set of words observed in the sample when the model is in the state q.
The total vocabulary of the language that has to be modelate
For example in a bigram:
This attitude approach assigns the probability 0 to the events that haven´t been said this cause problems of coverthe solution is smooth the modelwe can smooth the model with:plane,lineal,no lineal, back-off, sintact back-off..
Comments