Outline Motivation
Information overload in a scientific congress scenario
Conference Participant Advisor Service
Profile-driven paper recommending
User Profiles as Bayesian Text Classifiers
User Profiles learned from documents semantically indexed through a WSD procedure [*]
Empirical Evaluation
Conclusions and Future Work
[*] Combining Learning and Word Sense Disambiguation for Intelligent User Profiling - IJCAI 2007
Outline
Motivation
Information overload in a scientific congress scenario
Conference Participant Advisor Service
Profile-driven paper recommending
User Profiles as Bayesian Text Classifiers
User Profiles learned from documents semantically indexed through a WSD procedure [*]
Empirical Evaluation
Conclusions and Future Work
[*] Combining Learning and Word Sense Disambiguation for Intelligent User Profiling - IJCAI 2007
Motivation
Information overload in the scientific congress scenario
Motivation
Information overload in the scientific congress scenario
Web Personalization
Personalized systems adapt their behavior to individual users by learning user profiles
Structured model of the user interests
Exploitable for providing personalized content and services
Personalization usually done automatically based on the user profile and possibly the profiles of other users with similar interests (collaborative approach)
How personalization can be used in the scientific congress scenario?
Web Personalization in the scientific congress scenario
Learn research interests of participants from papers they rated
Store research interests in personal profiles
Used to build personalized programs delivered to participants
Learning User Profiles as a Text Categorization problem
OUR STRATEGY
content-based recommendations by learning from TEXT and USER FEEDBACK on items
Keyword-based profiles: problems
AI is a branch of computer science doc1 the 2007 International Joint Conference on Artificial Intelligence will be held in India doc2 apple launches a new product… doc3 artificial 0.02
intelligence 0.01
apple 0.13
AI 0.15
… USER PROFILE MULTI-WORD CONCEPTS
Keyword-based profiles: problems
AI is a branch of computer science doc1 the 2007 International Joint Conference on Artificial Intelligence will be held in India doc2 apple launches a new product… doc3 artificial 0.02
intelligence 0.01
apple 0.13
AI 0.15
… USER PROFILE SYNONYMY
Keyword-based profiles: problems
AI is a branch of computer science doc1 the 2007 International Joint Conference on Artificial Intelligence will be held in India doc2 apple launches a new product… doc3 artificial 0.02
intelligence 0.01
apple 0.13
AI 0.15
… USER PROFILE POLYSEMY
ITem Recommender (ITR)
Advanced NLP techniques used to represent documents
Naïve Bayes text classification to assign a score (level of interest) to items according to the user preferences
Result: semantic user profile - as a binary text classifier (user-likes and user-dislikes) - containing the probabilistic model of user preferences
ITem Recommender (ITR)
Word Sense Disambiguation (WSD)
Process of deciding which sense of a word is used in a specific context
WordNet as sense inventory
nouns, verbs, adverbs and adjectives organized into SYNonym SETs (synset), each one representing an underlying lexical concept
change of text representation from vectors (bag) of words (BOW) into vectors (bag) of synsets (BOS)
JIGSAW WSD algorithm
Three different strategies to disambiguate nouns, verbs, adjectives and adverbs
Effectiveness of WSD strongly influenced by the POS tag of the target word
Input: d = {w1, w2, …. , wh} document
Output: X = {s1, s2, …. , sk} (kh)
Each si obtained by disambiguating wi based on the context of each word
Some words not recognized by WordNet
Groups of words recognized as a single concept
JIGSAWnouns: The idea
Adaptation of the Resnik algorithm
Semantic similarity between synsets inversely proportional to their distance in the WordNet IS-A hierarchy
Path length similarity between synsets used to assign scores to the candidate synsets of a polysemous word
w = cat
C = {mouse} white hunt mouse cat mouse cat mouse 02244530: any of numerous small rodents… 03651364: a hand-operated electronic device … cat “The white cat is hunting the mouse” 02037721: feline mammal… 00847815: computerized axial tomography… T={02244530,03651364} Wcat={02037721,00847815}
JIGSAWnouns
w = cat
C = {mouse} white hunt cat mouse 02244530: any of numerous small rodents… 03651364: a hand-operated electronic device … cat T={02244530,03651364} “The white cat is hunting the mouse” 02037721: feline mammal… 00847815: computerized axial tomography… Wcat={02037721,00847815} 0.107 0.0 0.0 0.806 0.806 0.806
JIGSAWverbs: synset description
Glosses Description of synset si = gloss + example phrases in WordNet for si
JIGSAWverbs: synset description
Example phrases Description of synset si = gloss + example phrases in WordNet for si
Comments