Outline of JCDL11 Acceptance ratio 23.5% [57 / 243] Submitted papers: 243 papers Accepted papers: 57 papers 28 full papers and 29 short papers Future JCDL 2012: Washington DC, US 2013: Indianapolis, US 2014: ??? Candidates: Argentina, Italy, UK Joint with TPDL (Theory and Practice of Digital Libraries) “ECDL” -> “TPDL” (since 2011)
5
Research Topics in JCDL Collaborative and participatory information environments Cyberinfrastructure architectures, applications, and deployments Data mining/extraction of structure from networked information Digital library and Web Science curriculum development Distributed information systems Evaluation of online information environments Impact and evaluation of digital libraries and information in education Information and knowledge systems Information policy and copyright law Information visualization Interfaces to information for novices and experts Personal digital information management Retrieval and browsing Scientific data curation, citation and scholarly publication Social networks, virtual organizations and networked information Social-technical perspectives of digital information Studies of human factors in networked information Systems, algorithms, and models for data preservation Theoretical models of information interaction and organization User behavior and modeling Visualization of large-scale information environments (Cited from “http://jcdl2011.org/ExtendedCallForPapers”)
Pick up some topics this year, although research topics are listed on Web site.
6
Presented Research Topics Content analysis (18 papers) information extraction, plagiarism detection, topic coherence, etc. Education Information policy, rights Infrastructure Interfaces Metadata, Annotation (8 papers) Mobile applications Preservation, Archive User’s information needs (8 papers) Visualization WWW “Measuring Historical Word Sense Variation”
7
Content Analysis “Measuring Historical Word Sense Variation” David Bamman and Gregory Crane (Tufts University, US)
8
Outline Automatically classify Latin word senses Track the historical variation of these senses more than 2,000 years span Example: “radical” “Oxford English Dictionary” (1) Political meaning (1783 - ) “advocating thorough or far-reaching political or social reform” (2) Slang term (1964 - ) “Excellent, fantastic” Dataset 83,892 words from the aligned parallel corpus Manually annotated sample of 525 words
9
Proposed Approach Constructing Latin corpus Inducing Latin senses in English Word sense disambiguation Tracking sense variation over 2,000 years
10
Constructing Latin Corpus Collect Latin books from Internet Archive (http://www.archive.org/) 7,055 books 389 million words (Cited from D. Bamman and G. Crane: “Measuring Historical Word Sense Variation,” JCDL2011) Invention of printing press by “Industrial Revolution” Renaissance
11
Inducing Latin Senses Latin English 129 translation book pairs 40,323 sentence pairs MGIZA GIZA++ Alignments at the level of individual words Clean alignments for 504,857 words Aggregate English Translations for each Latin Lemma 109,432 Latin-English translation pairs (Training set for word sense disambiguation)
12
Word Sense Disambiguation Classifiers Language model classifier Trained on Uni-gram, bi-gram, 5-gram, 6-gram Naïve Bayes Trained on uni-gram TF-IDF Uni-gram K-nearest neighbor Features 20 words around each target word Baseline Simply select the most frequent sense from the lexicon
13 Tracking Sense Variation over 2,000 Years Apply 6-gram language model classifier to 389 million words “prayer” (16.7%) “speech” (83.3%) “prayer” (80.0%) “speech” (20.0%) “speech”
14
Doctoral Consortium 10 students Germany (1), Norway (2), Portugal (1), UK (1), US (5) Topics Metadata IR and NLP in Semi-structured data Applications Information Discovery Travel support for students
Comments