Components: Attribution span labeler28 Label the attribution spans for Explicit, Implicit, and AltLex Consists of two steps: Step 1: split the text into clauses Step 2: decide which clauses are attribution spans Features from curr, prev and next clauses: Unigrams of curr Lowercased and lemmatized verbs in curr First term of curr, Last term of curr, Last term of prev, First term of next Last term of prev + first term of curr, Last term of curr + first term of next Position of curr in the sent
Discourse Parsing in the Penn Discourse Treebank: Using Discourse Structures to Model Coherence and Improve User Tasks
Ziheng Lin Ph.D. Thesis Proposal Advisors: Prof Min-Yen Kan and Prof Hwee Tou Ng
Introduction
2 A text is usually understood by its discourse structure Discourse parsing: a process of Identifying discourse relations, and Constructing the internal discourse structure A number of discourse frameworks has been proposed: Mann & Thompson (1988) Lascarides & Asher (1993) Webber (2004) …
Introduction
3 The Penn Discourse Treebank (PDTB): Is a large-scale discourse-level annotation Follows Webber’s framework Understanding a text’s discourse structure is useful: Discourse structure and textual coherence have a strong connection Discourse parsing is useful in modeling coherence Discourse parsing also helps downstream NLP applications Contrast, Restatement summarization Cause QA
Introduction
4 Research goals: Design an end-to-end PDTB-styled discourse parser Propose a coherence model based on discourse structures Show discourse parsing improves downstream NLP application
Outline
5 Introduction Literature review Discourse parsing Coherence modeling Recognizing implicit discourse relations A PDTB-styled end-to-end discourse parser Modeling coherence using discourse relations Proposed work and timeline Conclusion
Discourse parsing
6 Recognize the discourse relations between two text spans, and Organize these relations into a discourse structure Two main classes of relations in PDTB: Explicit relations: explicit discourse connective such as however and because Implicit relations: no discourse connective, harder to recognize parsing implicit relations is a hard task
‹#›
Discourse parsing
7 Marcu & Echihabi (2002): Word pairs extracted from two text spans Collect implicit relations by removing connectives Wellner et al. (2006): Connectives, distance between text spans, and event-based features Discourse Graphbank: explicit and implicit Soricut & Marcu (2003): Probabilistic models on sentence-level segmentation and parsing RST Discourse Treebank (RST-DT) duVerle & Prendinger (2009): SVM to identify discourse structure and label relation types RST-DT Wellner & Pustejovsky (2007), Elwell & Baldridge (2008), Wellner (2009)
Coherence modeling
8 Barzilay & Lapata (2008): Local coherence Distribution of discourse entities exhibits certain regularities on a sentence-to-sentence transition Model coherence using an entity grid Barzilay & Lee (2004): Global coherence Newswire reports follow certain patterns of topic shift Used a domain-specific HMM model to capture topic shift in a text
Outline
9 Introduction Literature review Recognizing implicit discourse relations Methodology Experiments A PDTB-styled end-to-end discourse parser Modeling coherence using discourse relations Proposed work and timeline Conclusion
‹#›
Methodology
10 Supervised learning on a maximum entropy classifier Four feature classes Contextual features Constituent parse features Dependency parse features Lexical features
‹#›
Methodology: Contextual features
11 Dependencies between two adjacent discourse relations r1 and r2 independent fully embedded argument shared argument properly contained argument pure crossing partially overlapping argument Fully embedded argument and shared argument are the most common ones in the PDTB
Methodology:Contextual features
12 For an implicit relation curr that we want to classify, look at the surrounding two relations prev and next six binary features:
Methodology:Constituent parse features
13 Collect all production rules Three binary features to check whether a rule appears in Arg1, Arg2, and both S NP VP NP PRP PRP “We” ……
Methodology:Dependency parse features
14 Encode additional information at the word level Collect all words with the dependency types from their dependents: Three binary features to check whether a rule appears in Arg1, Arg2, and both “had” nsubj dobj “problems” det nn advmod “at” dep
Methodology:Lexical features
15 Marcu & Echihabi (2002) show word pairs are a good signal to classify discourse relations Arg1: John is good in math and sciences. Arg2: Paul fails almost every class he takes. (good, fails) is a good indicator for a contrast relation Stem and collect all word pairs from Arg1 and Arg2 as features
Outline
16 Introduction Literature review Recognizing implicit discourse relations Methodology Experiments A PDTB-styled end-to-end discourse parser Modeling coherence using discourse relations Proposed work and timeline Conclusion
Experiments
17 w/ feature selection Employed MI to select the top 100 rules, and top 500 word pairs (as word pairs are more sparse) Production rules, dependency rules, and word pairs all gave significant improvement with p < 0.01 Applying all feature classes yields the highest accuracy of 40.2% Results show predictiveness of feature classes: production rules > word pairs > dependency rules > context features w/o feature selection w/ feature selection count accuracy count accuracy Production Rules 11,113 36.7% 100 38.4% Dependency Rules 5,031 26.0% 100 32.4% Word Pairs 105,783 30.3% 500 32.9% Context Yes 28.5% Yes 28.5% All 35.0% 40.2% Baseline 26.1%
‹#›
Experiments
18 Question: can any of these feature classes be omitted to achieve the same level of performance? Add in feature classes in the order of their predictiveness production rules > word pairs > dependency rules > context features The results confirm that each additional feature class contributes a marginal performance improvement, and all feature classes are needed for the optimal performance Production Rules Dependency Rules Word pairs Context Acc. 100 100 500 Yes 40.2% 100 100 500 39.0% 100 500 38.9% 100 38.4%
Conclusion
19 Implemented an implicit discourse relation classifier Features include: Modeling of the context of the relations Features extracted from constituent and dependency trees Word pairs Achieved an accuracy of 40.2%, a 14.1% improvement over the baseline With a component that handles implicit relations, continue to design a full parser
Outline
20 Introduction Literature review Recognizing implicit discourse relations A PDTB-styled end-to-end discourse parser System overview Components Experiments Modeling coherence using discourse relations Proposed work and timeline Conclusion
Comments