Newest Viewed Downloaded

Hindi WordnetDravidian Language Wordnet North East Language Wordnet Marathi Wordnet Sanskrit Wordnet English Wordnet Bengali Wordnet Punjabi Wordnet Konkani Wordnet Urdu Wordnet INDOWORDNET

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 2– Wordnet and Word Sense Disambiguation)

Pushpak Bhattacharyya CSE Dept., IIT Bombay 6th Jan, 2011

Perpectivising NLP: Areas of AI and their inter-dependencies

Search Vision Planning Machine Learning Knowledge Representation Logic Expert Systems Robotics NLP

Books etc.

Main Text(s): Natural Language Understanding: James Allan Speech and NLP: Jurafsky and Martin Foundations of Statistical NLP: Manning and Schutze Other References: NLP a Paninian Perspective: Bharati, Cahitanya and Sangal Statistical NLP: Charniak Journals Computational Linguistics, Natural Language Engineering, AI, AI Magazine, IEEE SMC Conferences ACL, EACL, COLING, MT Summit, EMNLP, IJCNLP, HLT, ICON, SIGIR, WWW, ICML, ECML

Wordnet

A lexical knowledgebase based on conceptual lookup Organizing concepts in a semantic network. Organize lexical information in terms of word meaning, rather than word form Wordnet can also be used as a thesaurus. CFILT, IIT Bombay ‹#›

Psycholinguistic Theory

Human lexical memory for nouns as a hierarchy. Can canary sing? - Pretty fast response. Can canary fly? - Slower response. Does canary have skin? – Slowest response. (can move, has skin) (can fly) (can sing) Wordnet - a lexical reference system based on psycholinguistic theories of human lexical memory. Animal Bird canary CFILT, IIT Bombay ‹#›

Lexical Matrix

CFILT, IIT Bombay ‹#›

Wordnet

Wordnet is a network of words linked by lexical and semantic relations. The first wordnet in the world was for English developed at Princeton over 15 years. The Eurowordnet- linked structure of European language wordnets was built in 1998 over 3 years with funding from the EC as a a mission mode project. Wordnets for Hindi and Marathi being built at IIT Bombay are amongst the first IL wordnets. All these are proposed to be linked into the IndoWordnet which eventually will be linked to the English and the Euro wordnets. CFILT, IIT Bombay ‹#›

Hindi Wordnet

Dravidian Language Wordnet North East Language Wordnet Marathi Wordnet Sanskrit Wordnet English Wordnet Bengali Wordnet Punjabi Wordnet Konkani Wordnet Urdu Wordnet INDOWORDNET

Fundamental Design Question

Syntagmatic vs. Paradigmatic realtions? Psycholinguistics is the basis of the design. When we hear a word, many words come to our mind by association. For English, about half of the associated words are syntagmatically related and half are paradignatically related. For cat animal, mammal- paradigmatic mew, purr, furry- syntagmatic CFILT, IIT Bombay ‹#›

Stated Fundamental Application of Wordnet: Sense Disambiguation

Determination of the correct sense of the word The crane ate the fish vs. The crane was used to lift the load bird vs. machine CFILT, IIT Bombay ‹#›

The problem of Sense tagging

Given a corpora To Assign correct sense to the words. This is sense tagging. Needs Word Sense Disambiguation (WSD) Highly important for Question Answering, Machine Translation, Text Mining tasks. CFILT, IIT Bombay ‹#›

Classification of Words

Word Content Word Function Word Verb Noun Adjective Adverb Preposition Conjunction Pronoun Interjection

Example of sense marking: its need

एक_4187 नए शोध_1138 के अनुसार_3123 जिन लोगों_1189 का सामाजिक_43540 जीवन_125623 व्यस्त_48029 होता है उनके दिमाग_16168 के एक_4187 हिस्से_120425 में अधिक_42403 जगह_113368 होती है। (According to a new research, those people who have a busy social life, have larger space in a part of their brain). नेचर न्यूरोसाइंस में छपे एक_4187 शोध_1138 के अनुसार_3123 कई_4118 लोगों_1189 के दिमाग_16168 के स्कैन से पता_11431 चला कि दिमाग_16168 का एक_4187 हिस्सा_120425 एमिगडाला सामाजिक_43540 व्यस्तताओं_1438 के साथ_328602 सामंजस्य_166 के लिए थोड़ा_38861 बढ़_25368 जाता है। यह शोध_1138 58 लोगों_1189 पर किया गया जिसमें उनकी उम्र_13159 और दिमाग_16168 की साइज़ के आँकड़े_128065 लिए गए। अमरीकी_413405 टीम_14077 ने पाया_227806 कि जिन लोगों_1189 की सोशल नेटवर्किंग अधिक_42403 है उनके दिमाग_16168 का एमिगडाला वाला हिस्सा_120425 बाकी_130137 लोगों_1189 की तुलना_में_38220 अधिक_42403 बड़ा_426602 है। दिमाग_16168 का एमिगडाला वाला हिस्सा_120425 भावनाओं_1912 और मानसिक_42151 स्थिति_1652 से जुड़ा हुआ माना_212436 जाता है।

Ambiguity of लोगों (People)

लोग, जन, लोक, जनमानस, पब्लिक  - एक से अधिक व्यक्ति   "लोगों के हित में काम करना चाहिए" (English synset) multitude, masses, mass, hoi_polloi, people, the_great_unwashed - the common people generally "separate the warriors from the mass" "power to the people" दुनिया, दुनियाँ, संसार, विश्व, जगत, जहाँ, जहान, ज़माना, जमाना, लोक, दुनियावाले, दुनियाँवाले, लोग  - संसार में रहने वाले लोग   "महात्मा गाँधी का सम्मान पूरी दुनिया करती है / मैं इस दुनिया की परवाह नहीं करता / आज की दुनिया पैसे के पीछे भाग रही है" (English synset) populace, public, world - people in general considered as a whole "he is a hero in the eyes of the public”

Basic Principle

Words in natural languages are polysemous. However, when synonymous words are put together, a unique meaning often emerges. Use is made of Relational Semantics. Componential Semantics where each word is a bundle of semantic features (as in the Schankian Conceptual Dependency system or Lexical Componential Semantics) is to be examined as a viable alternative. CFILT, IIT Bombay ‹#›

Componential Semantics

Consider cat and tiger. Decide on componential attributes. For cat (Y, Y, N, Y) For tiger (Y,Y,Y,N) Complete and correct Attributes are difficult to design. Furry Carnivorous Heavy Domesticable CFILT, IIT Bombay ‹#›

Semantic relations in wordnet

Synonymy Hypernymy / Hyponymy Antonymy Meronymy / Holonymy Gradation Entailment Troponymy 1, 3 and 5 are lexical (word to word), rest are semantic (synset to synset). CFILT, IIT Bombay ‹#›

Synset: the foundation (house)

1. house -- (a dwelling that serves as living quarters for one or more families; "he has a house on Cape Cod"; "she felt she had to get out of the house") 2. house -- (an official assembly having legislative powers; "the legislature has two houses") 3. house -- (a building in which something is sheltered or located; "they had a large carriage house") 4. family, household, house, home, menage -- (a social unit living together; "he moved his family to Virginia"; "It was a good Christian household"; "I waited until the whole house was asleep"; "the teacher asked how many people made up his home") 5. theater, theatre, house -- (a building where theatrical performances or motion-picture shows can be presented; "the house was full") 6. firm, house, business firm -- (members of a business organization that owns or operates one or more establishments; "he worked for a brokerage house") 7. house -- (aristocratic family line; "the House of York") 8. house -- (the members of a religious community living together) 9. house -- (the audience gathered together in a theatre or cinema; "the house applauded"; "he counted the house") 10. house -- (play in which children take the roles of father or mother or children and pretend to interact like adults; "the children were playing house") 11. sign of the zodiac, star sign, sign, mansion, house, planetary house -- ((astrology) one of 12 equal areas into which the zodiac is divided) 12. house -- (the management of a gambling house or casino; "the h CFILT, IIT Bombay ‹#›

Creation of Synsets

Three principles: Minimality Coverage Replacability CFILT, IIT Bombay ‹#›

Synset creation (continued)

Home John’s home was decorated with lights on the occasion of Christmas. Having worked for many years abroad, John Returned home. House John’s house was decorated with lights on the occasion of Christmas. Mercury is situated in the eighth house of John’s horoscope. CFILT, IIT Bombay ‹#›

Showing 1 - 20 of 23 items Details

Name: 
cs626-460-lect2-wordne...
Author: 
cfdvs
Company: 
cfdvs,iit bombay
Description: 
Hindi WordnetDravidian Language Wordnet North East Language Wordnet Marathi Wordnet Sanskrit Wordnet English Wordnet Bengali Wordnet Punjabi Wordnet Konkani Wordnet Urdu Wordnet INDOWORDNET
Tags: 
the | house | wordnet | and | for | word | sense | words
Created: 
7/27/2007 7:29:18 AM
Slides: 
23
Views: 
1
Downloads: 
0
Rating: 
0


> Comment



Share this presentation
|

Comments

Share this presentation:

|
Sitemap