Newest Viewed Downloaded

The Cornetto Database Piek Vossen, Isa Maks, Willy Martin, Hennie van der Vliet => Vrije Universiteit Amsterdam, Faculteit der Letteren Katja Hofmann, => Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica Hetty van Zutphen => Irion Technologies CLIN-17, 12 January 2007, Leuven

The Cornetto Database Piek Vossen, Isa Maks, Willy Martin, Hennie van der Vliet => Vrije Universiteit Amsterdam, Faculteit der Letteren Katja Hofmann, => Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica Hetty van Zutphen => Irion Technologies CLIN-17, 12 January 2007, Leuven

Overview

Project background information Alignment of lexical resources Database design

Cornetto background

Stevin tender project to develop a lexical semantic database for Dutch: 40K Entries Generic and central part of the language Data: Combination of WordNet and FrameNet Vertical and horizontal semantic relations Combinatorial lexical constraints Aligned with the English Wordnet Extended with an ontology Automatic acquisition toolkit Consotium: Vrije Universiteit Amsterdam, Universiteit Amsterdam, Universiteit Leuven, Irion Technologies Started April 2006, ends March 2008 Licensed from TST-centrale, Nederlandse Taalunie http://www.let.vu.nl/onderzoek/projectsites/cornetto/start.htm

chronisch zieke (chronical patient), langdurig zieke (long-term patient), psychisch/geestelijk zieke (mental patient) zieke, patiënt (patient) ISA ρ-PROCEDURE ρ-LOCATION STATE ρ-CAUSE ρ-AGENT genezen(cure) ρ-PATIENT behandelen (treat) arts (doctor) ziekte, stoornis (illness, disorder) fysiotherapie (fysio-therapie), medicijnen (medicine), etc. ziekenhuis (hospital), etc. maagaandoening (stomach disorder) nieraandoening (kidney disorder), keelpijn (sour throat). ρ-PATIENT ISA ρ-AGENT kinderarts (child doctor) kind (child) co-ρ- AGENT-PATIENT ISA Horizontal & vertical semantic relations

Combinatorics

slots fillers (lex/conc) fillers (coll) action behandelen iem. behandelen (someone treat) theme patiënt een patiënt behandelen (a patient treat) state ziekte iem. behandelen voor een ziekte (someone treat for a disease) iem. aan zijn verwondingen behandelen (somene at his injuries treat) een ziekte behandelen (a disease treat)

Project overview

Dutch Wordnet Referentie Bestand English Wordnet SUMO (KIF) WN-DOMAINS Align/Merge Cornetto  * * * Ontology: Dolce, Sumo Entry LU/Synset Pos DWN RBN SUMO-pointer PWN-pointer Domain * * * Acquisition Toolkit Acquisition Toolkit Corpus Corpus Evaluation Corpus Editing Macro alignment Micro alignment DOLCE (KIF)

Alignment of lexical resources

Alignment

koffie-dwn1 (bonen) koffie-dwn2 (poeder) koffie-dwn3 (drank) koffie-dwn4 (heester) koffie-rbn1 (poeder) koffie-rbn2 (drank) Generate all weighted combinations: Produce merged output with mappings above probability threshold: New structure of word meanings koffie-cbn1(bonen) (source dwn1) koffie-cbn2 (poeder) (source dwn2, rbn1) koffie-cbn3 (drank) (source dwn3, rbn2) koffie-cbn4 (heester) (source dwn4)

Strategies for the macro-alignment

7,8 15,5 22,1 23,3 17,2 8,1 8,6 4,9 Dev. 18,5% 22664 3 91.6 8: overlapping definition words 9,0% 11008 2 70.2 7: overlapping domain-clusters 17,7% 21691 2 74.6 6: overlapping hyponyms 6,0% 7305 2 85.3 5: overlapping hyperonym word 1,1% 1357 1 68.2 4: >1 RBN & 1 DWN meaning 18,7% 22892 1 53.9 3: 1 RBN & >1 DWN meaning 20,8% 25366 3 88.5 2: 1 RBN & 1 DWN meaning 8,1% 9936 3 97.1 1: 1 RBN & 1 DWN meaning, no synonyms LINKS Factor Conf. 8 reviewers 100 random links per strategy nouns, verbs, adjectives, adverbs single confidence score per link based on all weighted strategies

Results of the macro-alignment

58.053 RBN-VLIS LINKS 59.580 44% 46.924 106.504 VLIS 18.774 72% 47.250 66.024 RBN NOT-LINKED LINKED LUS

Database design

Lexical Unit & Synsets

Lexical Unit = form-meaning relation, such that: form = abstract representation of certain realizations; part-of-speech is the same; meaning is the same, where meaning is defined by a refeernce to a unique Synset; Synset = Set of synonyms (LUs) that refer to the same entities in most contexts. Defined by lexical semantic relations; Defined by reference to ontology Terms or KIF expressions involving Terms from the ontology;

Data structure overview

Collections: Lexical units (LU): -> mainly derived from RBN Synsets (SY): -> mainly derived from DWN Terms (TE): -> based on SUMO/MILO, linked to PWN Domains (DM): -> based on Wordnet domains Mappings: LU<-> SY SY <-> SY (within Dutch and from Dutch to English) SY <-> TE SY <-> DM

Collection of Lexical Units Collection of Synsets Collection of Terms & Axioms Cornetto Identifiers Princeton Wordnet Wordnet Domains SUMO MILO LU C_lu_id=5345 C_form=band C_seq_nr=1 Combinatorics - de band speelt - een band vormen - een band treedt op - optreden van een band LU C_lu_id=4265 C_form=band C_seq_nr=2 Combinatorics - lekke band - een band oppompen - de band loopt leeg - volle band CID C_form=band C_seq_nr=1 C_lu_id=5345 C_syn_id=9884 R_lu_id=4234 R_seq_nr=1 D_lu_id=7366 D_syn_id=2456 D_seq_nr=3 SYNSET C_syn_id=9884 synonym - C_form=band - C_seq_nr=1 relations + muziekgezelschap - popgroep; jazzband Referentie Bestand Nederlands (RBN) R_lu_id=4234 R_seq_nr=1 Dutch Wordnet (DWN) D_lu_id=7366 D_syn_id=2456 D_seq_nr=3 Term MusicGroup Spanish Wordnet Czech Wordnet German Wordnet French Wordnet Korean Wordnet Arabic Wordnet Cornetto Database (CDB)

band#2 band#1 cassettebandje ring voorwerp band#5 verhouding relatie toestand fietsband buitenband binnenband autoband zwemband jazzband popgroep muziekgezelschap gezelschap groep muzikant muziek artiest bloedband familieband moederband band#3/geluidsband geluidsdrager informatiedrager schrijven lezen middel musiceren Combinatoriek de band starten op de band opnemen de band afspelen Combinatoriek een goede/sterke band de banden verbreken een band hebben met iemand Combinatoriek in een band spelen een band oprichten de band speelt Combinatoriek de band oppompen een band plakken een lekke band de band springt

Semantics for frame structures

Event structure for verbs from RBN: E: behandelen action A1: pers A2: pers C3: prep iemand aan [zijn verwondingen] behandelen een patiënt voor [een nieraandoening/puistje/keelpijn] behandelen iemand met [fysiotherapie/medicijnen]Instrument behandelen DWN: [causes] [v] genezen:2, beteren:1, herstellen:1 [involved_agent] [n] arts:1; dokter:1 [involved_patient] [n] zieke:1; patiënt:1 [involved_instrument] [n] hart-longmachine:1 [involved_instrument] [n] mitella:1, draagdoek:1 [involved_instrument] [n] geneesmiddel:1; medicijn:1 etc…

Ontologize Cornetto

Identity criteria OntoClean (Guarino & Welty 2002), : rigidity: to what extent are properties true for entities in all worlds? You are always a human, but you can be a student for a short while. essence: what properties are essential for an entity? Shape is essential for a statue but not for the clay it is made of. unicity: what represents a whole and what entities are parts of these wholes? An ocean is a whole but the water it contains is not. Hyponyms of hond (dog) in DWN: bokser; corgi; loboor; mopshond; pekinees; pointer; spaniël; pup; reu; teef bastaard; straathond; blindengeleidehond; bullebijter; diensthond; gashond; jachthond (hunting dog); lawinehond; schoothondje (lap dog);waakhond (watch dog)

Identity criteria applied to DWN

(Semi-)rigid type hierarchy in the ontology: Canine => PoodleDog; NewfoundlandDog; DalmatianDog, etc. Wordnet consists of names for (semi-)rigid dog-types and other words for dogs with roles: poedel = PoodleDog jachthond (?CAN) ð     (exists (?CAN ?EV) (and (instance ?CAN Canine) (instance ?EV Hunting) (agent ?CAN ?EV))) Type hierarchy remains compact and pure

Showing 1 - 20 of 32 items Details

Name: 
cornetto_clin17_2007
Author: 
N/A
Company: 
N/A
Description: 
The Cornetto Database Piek Vossen, Isa Maks, Willy Martin, Hennie van der Vliet => Vrije Universiteit Amsterdam, Faculteit der Letteren Katja Hofmann, => Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica Hetty van Zutphen => Irion Technologies CLIN-17, 12 January 2007, Leuven
Tags: 
band | dwn | align | rbn | wordnet | een | synset | cornetto
Created: 
1/22/2007 3:27:54 PM
Slides: 
32
Views: 
21
Downloads: 
0
Rating: 
0


Comment



Share this presentation
|

Comments

Share this presentation:

|
Sitemap