Computational Lexical Semantics Om Damani, IIT Bombay
Study of Word Meaning
Word Sense Disambiguation
Word Similarity
WordNet Relations
Do we really know the meaning of meaning
We will just take the dictionary definition as meaning
Word Sense Disambiguation (WSD)
WSD Applications: Search, _____, ______
Sense Inventory
Wordnet, Dictionary etc.
Plant in English Wordnet (#senses ??):
Noun Senses:
plant, works, industrial plant (buildings for carrying on industrial labor) "they built a large plant to manufacture automobiles"
plant, flora, plant life ((botany) a living organism lacking the power of locomotion)
plant (an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience)
plant (something planted secretly for discovery by another) "the police used a plant to trick the thieves"; "he claimed that the evidence against him was a plant"
Sense Inventory ..
Plant (Verb Senses):
plant, set (put or set (seeds, seedlings, or plants) into the ground) "Let's plant flowers in the garden"
implant, engraft, embed, imbed, plant (fix or set securely or deeply) "He planted a knee in the back of his opponent"; "The dentist implanted a tooth in the gum"
establish, found, plant, constitute, institute (set up or lay the groundwork for) "establish a new department"
plant (place into a river) "plant fish"
plant (place something or someone in a certain position in order to secretly observe or deceive) "Plant a spy in Moscow"; "plant bugs in the dissident's apartment"
plant, implant (put firmly in the mind) "Plant a thought in the students' minds"
How many Senses of सच्चा
Noun: सत्यवादी, सच्चा, सत्यभाषी, सत्यवक्ता - वह जो सत्य बोलता हो "आधुनिक समाज में भी सत्यवादियों की कमी नहीं है / यथार्थवादी होने के कारण कई लोग श्याम के दुश्मन बन गए हैं"
Adjective(6)
सत्यवादी, सच्चा, सत्यभाषी, सत्यवक्ता - जो सत्य बोलता हो "युधिष्ठिर एक सत्यवादी व्यक्ति थे"
ईमानदार, छलहीन, निष्कपट, निःकपट, रिजु, ऋजु, दयानतदार, सच्चा, अपैशुन, सत्यपर - चित्त में सद्वृत्ति या अच्छी नीयत रखनेवाला, चोरी या छल-कपट न करनेवाला "ईमानदार व्यक्ति सम्मान का पात्र होता है"
वास्तविक, यथार्थ, सच्चा, सही, असली, वास्तव, अकाल्पनिक, अकल्पित, अकूट, प्रकृत - जो वास्तव में हो या हुआ हो या बिल्कुल ठीक "मैंने अभी-अभी एक अविश्वसनीय पर वास्तविक घटना सुनी है"
सच्चा, असली - जो झूठा या बनावटी न हो "वह भारत माँ का सच्चा सपूत है"
खरा, चोखा, सच्चा - जो ईमानदारी, निष्पक्षता, न्याय आदि के आधार पर हो "हमें खरा सौदा करना चाहिए"
खरा, सच्चा, सीधा - बिना किसी बहाने या समझौता के यानि सीधा "वह इतना खरा नहीं है जितना दिखाता है“
How do you know these are different senses
Hint: think translation
How many Senses of आदमी
आदमी, पुरुष, मर्द, नर - नर जाति का मनुष्य "आदमी और औरत की शारीरिक संरचनाएँ भिन्न होती हैं"
मानव, आदमी, इंसान, इन्सान, इनसान, मनुष्य, मानुष, मानुस, मनुष, नर - वह द्विपद प्राणी जो अपने बुद्धिबल के कारण सब प्राणियों में श्रेष्ठ है और जिसके अंतर्गत हम,आप और सब लोग हैं " आदमी अपनी बुद्धि के कारण सभी प्राणियों में श्रेष्ठ है"
व्यक्ति, मानस, आदमी, शख़्स, शख्स, जन, बंदा, बन्दा - मनुष्य जाति या समूह में से कोई एक "इस कार में दो ही आदमी बैठ सकते हैं"
नौकर, सेवक, दास, अनुचर, ख़ादिम, मुलाज़िम, मुलाजिम, आदमी, टहलुआ, पार्षद, लौंडा, अनुग, अनुचारक, अनुचारी, अनुयायी, पाबंद, पाबन्द, नफर, अभिचर, भृत्य, गण, अभिसर, अभिसारी - वह जो सेवा करता हो "मेरा आदमी एक हफ्ते के लिए घर गया है"
पति, मर्द, शौहर, घरवाला, मियाँ, आदमी, ख़सम, खसम, स्वामी, अधीश, नाथ, कांत, कंत, परिणेता, वारयिता, दयित - स्त्री की दृष्टि से उसका विवाहित पुरुष "शीला का आदमी किसानी करके परिवार का पालन-पोषण करता है“
How do you know these are different senses
Hint: think translation
WSD: Problem Statement
Given a string of words (sentence, phrase, set of key-words), and a set of senses for each word, decide the appropriate sense for each word.
Example: Translate ‘Where can I get spare parts for textile plant ?’ to Hindi
Solution Approaches
Solution depends on what resources do you have:
Definition, Gloss
Topic/Category label for each sense definition
Selectional preference for each sense
Sense Marked Corpora
Parallel Sense-Marked Corpora
Combinatorial Explosion Problem
I saw a man who is 98 years old and can still walk and tell jokes
See(26), man(11), year(4), old(8), can(5). Still(4), walk(10), tell(8), joke(3).
4,39,29,600 sense combinations
Solution: Viterbi ??
Dictionary-Based WSD
Dictionary-Based WSD
The bank did not give loan to him though he offered to mortgage his boat. the slope beside a body of water
“they pulled the boat up on the bank”, “he watched the currents from the river bank ” Gloss
Example bank a financial institution that accepts deposits and gives loan
“he cashed a check at the bank”, “that bank holds the mortgage on my home” Gloss
Example bank The bank did not give loan to him though he offered to mortgage his boat.
How to improve the LESK further
Give an example where the algo fails – say for bank
“The bank did not give loan to him though he offered his boat as collateral.”
Problem: collateral is related to the bank but the relation does not come out clearly
Solution: See if the definition of bank and definition of collateral share a term:
Collateral: security pledged for loan repayment
Problem: Can you give an example where the new algorithm fails too
LESK Algorithm Function Lesk (word, sentence)
returns best sense of word
context := set of words in sentence;
for each sense in senses of word do
sense.signature := GetSignature (sense);
sense.relevance := ComputeRelevance
( sense.signature, context );
end
best-sense := MaxRelevantSense () ;
if ( best-sense.relevance == 0 )
best-sense := GetDefaultSense (word);
return best-sense;
GetSignature ( sense ): Get all words in example and gloss of sense
ComputeRelevance ( signature, context ): number of common words
GetSignature ( sense )
All words in example and gloss of sense
All words in gloss of sense
All words in gloss of all words in the gloss of the given sense
All words in gloss of all words in gloss of all words in gloss
…..
Problem:
Including the right sense of each word in gloss needs WSD
Including all senses of all words in gloss will lead to sense-drift
Possible Solution: All context words in a sense marked corpora
Ideal Signature
For each word, get a Vector of all the words in the language
Work with a |V|x|V| Matrix
Iterate over it, till it converges
Function ComputeRelevance( {signature}, {context} )
relevance := 0;
for each sig-word in signature do
for each con-word in context do
wordRelevance := WordRelevance
( sig-word, con-word );
relevance += wordRelevance *weight( sig-word);
end end
relevance /= Normalize ( signature, context );
return relevance;
Signature1: a financial institution that accepts deposits and gives loan
Signature2: the slope beside a body of water
Context: The bank did not give loan to him though he offered to mortgage his boat
number of common words:
| signature Intersection context |
Favors longer signatures
| signature Intersection context | / | signature Union context |
Define Relevance between two words
Synonyms
Specialization, Generalization has to be accounted for – canoe, boat
Even more general: credit, money
Sum of Relevance between all word pairs
Weigh different terms differently – maybe based on TF-IDF score ComputeRelevance( {signature}, {context} )
GetDefaultSense ( word )
The most frequent sense
The most frequent sense in a given domain
The most frequent sense as per the topic of the document
Power of the LESK Schema
Signature can even be a topic/domain code: finance, travel, geology, physics, civics
All variations of ComputeRelevance function are still applicable:
Defn: the slope beside a body of water
Signature: geology, physics, geology
Defn: a financial institution that accepts deposits and gives loan
Signature: finance, civics, finance, finance
Sentence: The bank did not give loan to him though he offered to mortgage his boat
Context: finance, finance, travel
Problem: Various senses of a word have different topics
Solution: ??
Possible Improvements
LESK gives equal weightage to all senses - ‘right’ sense should be given more weight
Iterative fashion – one at a time – most certain first
Page Rank like algo
Give more weightage to Gloss than to Example in computing relevance
Comments