Newest Viewed Downloaded

language modelling María Fernández Pajares Verarbeitung gesprochener Sprache

Bigrams are easily incorporated in Viterbi search Trigrams used for large vocabulary recognition in mid-1970’s and remain the dominant language modeL IBM TRIGRAM EXAMPLE:

Methods, in order to measure the probability of ungesehenen N-grams: n-gram performance can be improved by clustering words – Hard clustering puts a word into a single cluster – Soft clustering allows a word to belong to multiple clusters • Clusters can be created manually, or automatically – Manually created clusters have worked well for small domains – Automatic clusters have been created bottom-up or top-down

PERPLEXITY Average of options Quantifying LM Complexity • One LM is better than another if it can predict an n word test corpus W with a higher probability • For LMs representable by the chain rule, comparisons are usually based on the average per word logprob, LP • A more intuitive representation of LP is the perplexity (a uniform LM will have PP equal to vocabulary size) • PP is often interpreted as an average branching factor

Perplexity Examples

Bibliography:

P. Brown et al., Class-based n-gram models of natural language, Computational Linguistics, 1992. • R. Lau, Adaptive Statistical Language Modelling, S.M. Thesis, MIT, 1994. • M. McCandless, Automatic Acquisition of Language Models for Speech Recognition, S.M. Thesis, MIT, 1994. L.R.Rabiner y B.-H.Juang:”Fundamentals of Speech Recognition”,Prentice-Hall,1993 GOOGLE

Showing 21 - 25 of 25 items Details

Name: 
languagemodelling
Author: 
M F P
Company: 
N/A
Description: 
language modelling María Fernández Pajares Verarbeitung gesprochener Sprache
Tags: 
model | languag | word | grammar | probabl | automat | regular | cluster
Created: 
12/16/2006 7:01:43 PM
Slides: 
25
Views: 
6
Downloads: 
4
Rating: 
0


> Comment



Share this presentation
|

Comments

Share this presentation:

|
Sitemap