The Cornetto Database Piek Vossen, Isa Maks, Willy Martin, Hennie van der Vliet
=> Vrije Universiteit Amsterdam, Faculteit der Letteren
Katja Hofmann, => Universiteit van Amsterdam, Faculteit der Natuurwetenschappen, Wiskunde en Informatica
Hetty van Zutphen
=> Irion Technologies
CLIN-17, 12 January 2007, Leuven
Next steps
Done:
Macro alignment & database
In progress:
Editing
Revising critical alignments
Defining ontology constraints
Revising word meanings based on ontology distinctions
Revising ontology assignment
Micro-level alignment
Automatic acquisition
Task-based evaluation
The end…..
Consortium
Vrije Universiteit Amsterdam, Faculteit der Letteren, General Linguistics Department, Onderzoeksgroep Lexicologie/Terminologie:
Willy Martin, Isa Maks, Hennie vd Vliet, Roxane Segers, Piek Vossen
Universiteit van Amsterdam, Instituut voor Informatica:
Maarten de Rijke, Erik Tjong Kim Sang, Katja Hofmann
Katholieke Universiteit Leuven, Interdisciplinair Centrum voor Recht en Informatica (ICRI):
Sien Moens, Jan de Beer
Irion Technologies BV:
Joop van Gent, Hetty van Zutphen, Piek Vossen
Other partners
User-group:
Polderland
Knowledge Concepts
LibRT
Irion Technologies
Van Dale Lexicografie
Larcier-De Boeck
Rik Schutz
Ontology-group:
Dr. W. Ceusters, Office Line Engineering nv
Prof. F. van Harmelen, Vrije Universiteit Amsterdam
Dr. P. Buitelaar, DFKI
Dr. P. Monachesi, Universiteit van Utrecht
Approach
Combine the information from two existing Dutch lexical resources:
The Dutch wordnet: synsets and lexical semantic relations
The Referentiebestand Nederlands: morpho-syntactic information, semantic information, pragmatic information, frame structures, lexical functions and combinatorics
Macro level alignment
Micro level alignment
Populate with an ontology
Global planning
Two year project:
Month 1-6: design and database
Month 1-6: automatically aligned data
Month 7-10: ontology assignment
Month 7-22: editing
Month 7-15: acquisition
Month 16-17, 23-24: task-based evaluation
Alignment
Macro level alignment:
Lemma+pos
Word meanings
Micro level alignment:
For each word meaning:
Co-index DWN and RBN information
Derive a new fused structure
Cornetto Mapping Record
CID unique pointer to bind them all, assigned by IRION
C_LU_ID LU id to be assigned to each LU in CDB
C_SY_ID SYNSET id to be assigned to each synset in CDB
C_FORM lexical form
C_SEQ_NR sequence number in CDB
R_LU_ID LU id currently used in RBN
R_SEQ_NR sequence number currently used in RBN
D_LU_ID LU id currently used in DWN (original Vlis ID)
D_SEQ_NR sequence number currently used in DWN
D_SY_ID synset id currently used in DWN
Score confidence score assigned by algorithm
Status manually confirmed
Name editor
Creation of Cornetto LUs and Synsets
No mapping for a LU in RBN to a synonym in DWN:
create unique LU in Cornetto based on RBN LU. We do not create a synset for the LU in Cornetto;
No mapping for a synonym in DWN to an LU in RBN:
create unique synonym in a unique synset in Cornetto
create corresponding Cornetto LU with the information from DWN;
If there is a best scoring mapping between an LU in RBN and a synonym in DWN:
create single unique LU and a single unique synonym in Cornetto that point to each other and to both RBN and DWN;
All remaining mappings:
do not create LUs and/or synsets;
stored as additional mappings (as weighted alternatives);
Comments