The Cornetto Database Piek Vossen, Isa Maks, Willy Martin, Hennie van der Vliet
=> Vrije Universiteit Amsterdam, Faculteit der Letteren
Katja Hofmann, => Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica
Hetty van Zutphen
=> Irion Technologies
CLIN-17, 12 January 2007, Leuven
The Cornetto Database Piek Vossen, Isa Maks, Willy Martin, Hennie van der Vliet
=> Vrije Universiteit Amsterdam, Faculteit der Letteren
Katja Hofmann, => Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica
Hetty van Zutphen
=> Irion Technologies
CLIN-17, 12 January 2007, Leuven
Overview
Project background information
Alignment of lexical resources
Database design
Cornetto background
Stevin tender project to develop a lexical semantic database for Dutch:
40K Entries
Generic and central part of the language
Data:
Combination of WordNet and FrameNet
Vertical and horizontal semantic relations
Combinatorial lexical constraints
Aligned with the English Wordnet
Extended with an ontology
Automatic acquisition toolkit
Consotium: Vrije Universiteit Amsterdam, Universiteit Amsterdam, Universiteit Leuven, Irion Technologies
Started April 2006, ends March 2008
Licensed from TST-centrale, Nederlandse Taalunie
http://www.let.vu.nl/onderzoek/projectsites/cornetto/start.htm
chronisch zieke (chronical patient), langdurig zieke (long-term patient),
psychisch/geestelijk zieke (mental patient) zieke, patiënt
(patient) ISA ρ-PROCEDURE ρ-LOCATION STATE ρ-CAUSE ρ-AGENT genezen(cure) ρ-PATIENT behandelen
(treat) arts (doctor) ziekte, stoornis
(illness, disorder) fysiotherapie
(fysio-therapie),
medicijnen
(medicine), etc. ziekenhuis (hospital), etc. maagaandoening (stomach disorder)
nieraandoening (kidney disorder), keelpijn (sour throat). ρ-PATIENT ISA ρ-AGENT kinderarts
(child doctor) kind
(child) co-ρ-
AGENT-PATIENT ISA Horizontal & vertical semantic relations
Combinatorics
slots fillers (lex/conc) fillers (coll)
action behandelen iem. behandelen (someone treat)
theme patiënt een patiënt behandelen (a patient treat)
state ziekte iem. behandelen voor een ziekte (someone treat for a disease)
iem. aan zijn verwondingen behandelen
(somene at his injuries treat)
een ziekte behandelen (a disease treat)
Project overview
Dutch Wordnet Referentie
Bestand English Wordnet SUMO (KIF) WN-DOMAINS Align/Merge Cornetto * * * Ontology:
Dolce, Sumo Entry
LU/Synset
Pos
DWN
RBN
SUMO-pointer
PWN-pointer
Domain * * * Acquisition
Toolkit Acquisition
Toolkit Corpus Corpus Evaluation Corpus Editing Macro alignment
Micro alignment DOLCE (KIF)
Alignment of lexical resources
Alignment
koffie-dwn1 (bonen) koffie-dwn2 (poeder) koffie-dwn3 (drank) koffie-dwn4 (heester) koffie-rbn1 (poeder) koffie-rbn2 (drank) Generate all weighted combinations:
Produce merged output with mappings above probability threshold:
New structure of word meanings
koffie-cbn1(bonen) (source dwn1)
koffie-cbn2 (poeder) (source dwn2, rbn1)
koffie-cbn3 (drank) (source dwn3, rbn2)
koffie-cbn4 (heester) (source dwn4)
Strategies for the macro-alignment
7,8 15,5 22,1 23,3 17,2 8,1 8,6 4,9 Dev. 18,5% 22664 3 91.6 8: overlapping definition words 9,0% 11008 2 70.2 7: overlapping domain-clusters 17,7% 21691 2 74.6 6: overlapping hyponyms 6,0% 7305 2 85.3 5: overlapping hyperonym word 1,1% 1357 1 68.2 4: >1 RBN & 1 DWN meaning 18,7% 22892 1 53.9 3: 1 RBN & >1 DWN meaning 20,8% 25366 3 88.5 2: 1 RBN & 1 DWN meaning 8,1% 9936 3 97.1 1: 1 RBN & 1 DWN meaning, no synonyms LINKS Factor Conf. 8 reviewers
100 random links per strategy
nouns, verbs, adjectives, adverbs
single confidence score per link based on all weighted strategies
Lexical Unit = form-meaning relation, such that:
form = abstract representation of certain realizations;
part-of-speech is the same;
meaning is the same, where meaning is defined by a refeernce to a unique Synset;
Synset = Set of synonyms (LUs) that refer to the same entities in most contexts.
Defined by lexical semantic relations;
Defined by reference to ontology Terms or KIF expressions involving Terms from the ontology;
Data structure overview
Collections:
Lexical units (LU): -> mainly derived from RBN
Synsets (SY): -> mainly derived from DWN
Terms (TE): -> based on SUMO/MILO, linked to PWN
Domains (DM): -> based on Wordnet domains
Mappings:
LU<-> SY
SY <-> SY (within Dutch and from Dutch to English)
SY <-> TE
SY <-> DM
Collection
of
Lexical Units Collection
of
Synsets
Collection
of
Terms & Axioms Cornetto Identifiers Princeton
Wordnet Wordnet
Domains SUMO
MILO LU
C_lu_id=5345
C_form=band
C_seq_nr=1
Combinatorics
- de band speelt
- een band vormen
- een band treedt op
- optreden van een band
LU
C_lu_id=4265
C_form=band
C_seq_nr=2
Combinatorics
- lekke band
- een band oppompen
- de band loopt leeg
- volle band CID
C_form=band
C_seq_nr=1
C_lu_id=5345
C_syn_id=9884
R_lu_id=4234
R_seq_nr=1
D_lu_id=7366
D_syn_id=2456
D_seq_nr=3
SYNSET
C_syn_id=9884
synonym
- C_form=band
- C_seq_nr=1
relations
+ muziekgezelschap
- popgroep; jazzband Referentie
Bestand
Nederlands (RBN) R_lu_id=4234
R_seq_nr=1 Dutch
Wordnet (DWN) D_lu_id=7366
D_syn_id=2456
D_seq_nr=3 Term
MusicGroup Spanish
Wordnet Czech
Wordnet German
Wordnet French
Wordnet Korean
Wordnet Arabic
Wordnet Cornetto
Database
(CDB)
band#2 band#1 cassettebandje ring voorwerp band#5 verhouding relatie toestand fietsband buitenband binnenband autoband zwemband jazzband popgroep muziekgezelschap gezelschap groep muzikant muziek artiest bloedband familieband moederband band#3/geluidsband geluidsdrager informatiedrager schrijven lezen middel musiceren Combinatoriek de band starten op de band opnemen de band afspelen Combinatoriek een goede/sterke band de banden verbreken een band hebben met iemand Combinatoriek in een band spelen een band oprichten de band speelt Combinatoriek de band oppompen een band plakken een lekke band de band springt
Semantics for frame structures
Event structure for verbs from RBN:
E: behandelen action
A1: pers
A2: pers
C3: prep
iemand aan [zijn verwondingen] behandelen
een patiënt voor [een nieraandoening/puistje/keelpijn] behandelen
iemand met [fysiotherapie/medicijnen]Instrument behandelen
DWN:
[causes] [v] genezen:2, beteren:1, herstellen:1
[involved_agent] [n] arts:1; dokter:1
[involved_patient] [n] zieke:1; patiënt:1
[involved_instrument] [n] hart-longmachine:1
[involved_instrument] [n] mitella:1, draagdoek:1
[involved_instrument] [n] geneesmiddel:1; medicijn:1
etc…
Ontologize Cornetto
Identity criteria OntoClean (Guarino & Welty 2002), :
rigidity: to what extent are properties true for entities in all worlds? You are always a human, but you can be a student for a short while.
essence: what properties are essential for an entity? Shape is essential for a statue but not for the clay it is made of.
unicity: what represents a whole and what entities are parts of these wholes? An ocean is a whole but the water it contains is not.
Hyponyms of hond (dog) in DWN:
bokser; corgi; loboor; mopshond; pekinees; pointer; spaniël;
pup; reu; teef
bastaard; straathond; blindengeleidehond; bullebijter; diensthond; gashond; jachthond (hunting dog); lawinehond; schoothondje (lap dog);waakhond (watch dog)
Identity criteria applied to DWN
(Semi-)rigid type hierarchy in the ontology:
Canine => PoodleDog; NewfoundlandDog; DalmatianDog, etc.
Wordnet consists of names for (semi-)rigid dog-types and other words for dogs with roles:
poedel = PoodleDog
jachthond (?CAN)
ð (exists (?CAN ?EV)
(and
(instance ?CAN Canine)
(instance ?EV Hunting)
(agent ?CAN ?EV)))
Type hierarchy remains compact and pure
Comments