Newest Viewed Downloaded

Backup

Computational Lexical Semantics Om Damani, IIT Bombay

Study of Word Meaning

Word Sense Disambiguation Word Similarity WordNet Relations Do we really know the meaning of meaning We will just take the dictionary definition as meaning

Word Sense Disambiguation (WSD)

WSD Applications: Search, _____, ______

Sense Inventory

Wordnet, Dictionary etc. Plant in English Wordnet (#senses ??): Noun Senses: plant, works, industrial plant (buildings for carrying on industrial labor) "they built a large plant to manufacture automobiles" plant, flora, plant life ((botany) a living organism lacking the power of locomotion) plant (an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience) plant (something planted secretly for discovery by another) "the police used a plant to trick the thieves"; "he claimed that the evidence against him was a plant"

Sense Inventory ..

Plant (Verb Senses): plant, set (put or set (seeds, seedlings, or plants) into the ground) "Let's plant flowers in the garden" implant, engraft, embed, imbed, plant (fix or set securely or deeply) "He planted a knee in the back of his opponent"; "The dentist implanted a tooth in the gum" establish, found, plant, constitute, institute (set up or lay the groundwork for) "establish a new department" plant (place into a river) "plant fish" plant (place something or someone in a certain position in order to secretly observe or deceive) "Plant a spy in Moscow"; "plant bugs in the dissident's apartment" plant, implant (put firmly in the mind) "Plant a thought in the students' minds"

How many Senses of सच्चा

Noun: सत्यवादी, सच्चा, सत्यभाषी, सत्यवक्ता - वह जो सत्य बोलता हो "आधुनिक समाज में भी सत्यवादियों की कमी नहीं है / यथार्थवादी होने के कारण कई लोग श्याम के दुश्मन बन गए हैं" Adjective(6) सत्यवादी, सच्चा, सत्यभाषी, सत्यवक्ता - जो सत्य बोलता हो "युधिष्ठिर एक सत्यवादी व्यक्ति थे" ईमानदार, छलहीन, निष्कपट, निःकपट, रिजु, ऋजु, दयानतदार, सच्चा, अपैशुन, सत्यपर - चित्त में सद्वृत्ति या अच्छी नीयत रखनेवाला, चोरी या छल-कपट न करनेवाला "ईमानदार व्यक्ति सम्मान का पात्र होता है" वास्तविक, यथार्थ, सच्चा, सही, असली, वास्तव, अकाल्पनिक, अकल्पित, अकूट, प्रकृत - जो वास्तव में हो या हुआ हो या बिल्कुल ठीक "मैंने अभी-अभी एक अविश्वसनीय पर वास्तविक घटना सुनी है" सच्चा, असली - जो झूठा या बनावटी न हो "वह भारत माँ का सच्चा सपूत है" खरा, चोखा, सच्चा - जो ईमानदारी, निष्पक्षता, न्याय आदि के आधार पर हो "हमें खरा सौदा करना चाहिए" खरा, सच्चा, सीधा - बिना किसी बहाने या समझौता के यानि सीधा "वह इतना खरा नहीं है जितना दिखाता है“ How do you know these are different senses Hint: think translation

How many Senses of आदमी

आदमी, पुरुष, मर्द, नर - नर जाति का मनुष्य "आदमी और औरत की शारीरिक संरचनाएँ भिन्न होती हैं" मानव, आदमी, इंसान, इन्सान, इनसान, मनुष्य, मानुष, मानुस, मनुष, नर - वह द्विपद प्राणी जो अपने बुद्धिबल के कारण सब प्राणियों में श्रेष्ठ है और जिसके अंतर्गत हम,आप और सब लोग हैं " आदमी अपनी बुद्धि के कारण सभी प्राणियों में श्रेष्ठ है" व्यक्ति, मानस, आदमी, शख़्स, शख्स, जन, बंदा, बन्दा - मनुष्य जाति या समूह में से कोई एक "इस कार में दो ही आदमी बैठ सकते हैं" नौकर, सेवक, दास, अनुचर, ख़ादिम, मुलाज़िम, मुलाजिम, आदमी, टहलुआ, पार्षद, लौंडा, अनुग, अनुचारक, अनुचारी, अनुयायी, पाबंद, पाबन्द, नफर, अभिचर, भृत्य, गण, अभिसर, अभिसारी - वह जो सेवा करता हो "मेरा आदमी एक हफ्ते के लिए घर गया है" पति, मर्द, शौहर, घरवाला, मियाँ, आदमी, ख़सम, खसम, स्वामी, अधीश, नाथ, कांत, कंत, परिणेता, वारयिता, दयित - स्त्री की दृष्टि से उसका विवाहित पुरुष "शीला का आदमी किसानी करके परिवार का पालन-पोषण करता है“ How do you know these are different senses Hint: think translation

WSD: Problem Statement

Given a string of words (sentence, phrase, set of key-words), and a set of senses for each word, decide the appropriate sense for each word. Example: Translate ‘Where can I get spare parts for textile plant ?’ to Hindi

Solution Approaches

Solution depends on what resources do you have: Definition, Gloss Topic/Category label for each sense definition Selectional preference for each sense Sense Marked Corpora Parallel Sense-Marked Corpora

Combinatorial Explosion Problem

I saw a man who is 98 years old and can still walk and tell jokes See(26), man(11), year(4), old(8), can(5). Still(4), walk(10), tell(8), joke(3). 4,39,29,600 sense combinations Solution: Viterbi ??

Dictionary-Based WSD

Dictionary-Based WSD

The bank did not give loan to him though he offered to mortgage his boat. the slope beside a body of water “they pulled the boat up on the bank”, “he watched the currents from the river bank ” Gloss Example bank a financial institution that accepts deposits and gives loan “he cashed a check at the bank”, “that bank holds the mortgage on my home” Gloss Example bank The bank did not give loan to him though he offered to mortgage his boat.

How to improve the LESK further

Give an example where the algo fails – say for bank “The bank did not give loan to him though he offered his boat as collateral.” Problem: collateral is related to the bank but the relation does not come out clearly Solution: See if the definition of bank and definition of collateral share a term: Collateral: security pledged for loan repayment Problem: Can you give an example where the new algorithm fails too

LESK Algorithm Function Lesk (word, sentence) returns best sense of word context := set of words in sentence; for each sense in senses of word do sense.signature := GetSignature (sense); sense.relevance := ComputeRelevance ( sense.signature, context ); end best-sense := MaxRelevantSense () ; if ( best-sense.relevance == 0 ) best-sense := GetDefaultSense (word); return best-sense; GetSignature ( sense ): Get all words in example and gloss of sense ComputeRelevance ( signature, context ): number of common words

GetSignature ( sense )

All words in example and gloss of sense All words in gloss of sense All words in gloss of all words in the gloss of the given sense All words in gloss of all words in gloss of all words in gloss ….. Problem: Including the right sense of each word in gloss needs WSD Including all senses of all words in gloss will lead to sense-drift Possible Solution: All context words in a sense marked corpora

Ideal Signature

For each word, get a Vector of all the words in the language Work with a |V|x|V| Matrix Iterate over it, till it converges

Function ComputeRelevance( {signature}, {context} ) relevance := 0; for each sig-word in signature do for each con-word in context do wordRelevance := WordRelevance ( sig-word, con-word ); relevance += wordRelevance *weight( sig-word); end end relevance /= Normalize ( signature, context ); return relevance; Signature1: a financial institution that accepts deposits and gives loan Signature2: the slope beside a body of water Context: The bank did not give loan to him though he offered to mortgage his boat number of common words: | signature Intersection context | Favors longer signatures | signature Intersection context | / | signature Union context | Define Relevance between two words Synonyms Specialization, Generalization has to be accounted for – canoe, boat Even more general: credit, money Sum of Relevance between all word pairs Weigh different terms differently – maybe based on TF-IDF score ComputeRelevance( {signature}, {context} )

GetDefaultSense ( word )

The most frequent sense The most frequent sense in a given domain The most frequent sense as per the topic of the document

Power of the LESK Schema

Signature can even be a topic/domain code: finance, travel, geology, physics, civics All variations of ComputeRelevance function are still applicable: Defn: the slope beside a body of water Signature: geology, physics, geology Defn: a financial institution that accepts deposits and gives loan Signature: finance, civics, finance, finance Sentence: The bank did not give loan to him though he offered to mortgage his boat Context: finance, finance, travel Problem: Various senses of a word have different topics Solution: ??

Possible Improvements

LESK gives equal weightage to all senses - ‘right’ sense should be given more weight Iterative fashion – one at a time – most certain first Page Rank like algo Give more weightage to Gloss than to Example in computing relevance

Showing 1 - 20 of 53 items Details

Name: 
SemanticsWSD
Author: 
N/A
Company: 
N/A
Description: 
Backup
Tags: 
the | sense | and | gloss | words | word | for | plant
Created: 
2/26/2009 10:35:00 AM
Slides: 
53
Views: 
1
Downloads: 
0
Rating: 
0


> Comment



Share this presentation
|

Comments

Share this presentation:

|
Sitemap