Parsing_English - SlideFinder - PowerPoint search engine with thumbnail results
Amirkabir University of Technology
Computer Engineering Faculty
AILAB
Parsing
Ahmad Abdollahzadeh Barfouroush
Aban 1381 Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Grammar and Parsing To Examine how the syntatic structure of a sentence can be
computed, you must consider two things:
Grammar: A formal specification of the allowable structures
in the language.
Parsing: The method of analysing a sentence to determine its
structure according to the grammar.
Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Parsing Algorithm Parsing Algorithm
A parsing algorithm is a procedure that searches through various ways
of combining grammatical rules to find a combination that generates
a tree that could be the structure of the input sentence.
For simplification, the parser simply returns a yes or no answer as to
whether a certain sentence is accepted by grammar or not.
Top-Down Parser
A top-down parser starts with the S symbol and attempts to rewrite it
into a sequence of terminal symbols that matches the classes of the words
in the input sentence.
Bottom-up Parser
A bottom-up parser Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Parsing Techniques Top-Down Parsing
A top-down parser starts with the S symbol and attempts to rewrite it
into a sequence of terminal symbols that matches the classes of the words
in the input sentence.
Bottom-up Parsing
A bottom-up parser starts with the terminal symbols in the input sentence
The parser successively rewrite a sequence of terminal symbols and/or
terminal symbols with the left hand side of a grammar rule.
(always a single non-terminal in CFG) Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Top-down Parser Symbol list: State of parse
The state of the parser at any given time that is represented as a list
of symbols that are the result of operations applied so far and a
number indicating the current position in the sentence.
Example:
Parser starts with ((S)1).
Rule Current State
S NP VP ((NP VP) 1)
NP ART N ((ART N VP) 1)
Lexicon
Efficiently stores the possible categories for each word.
With lexicon the grammar need not contain any lexical rules.
Example:
cried: V dogs: N,V the: ART Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
State of Parser The number indicates the current position in the sentence. Positions fall
between the words, with 1 being the position before the first word.
Example:
1 The 2 dogs 3 cried 4
Parse state: ((N VP) 2)
The parser needs to find an N followed by a VP, starting at position 2.
Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
New State Generation a) If the first symbol is a lexical symbol and the next word can belong
that lexical category, then update the state by removing the first symbol
And position number.
Example:
dogs is N in the lexicon
Next parser state: ((VP) 3) : The parser needs to find a VP starting
at position 3.
b) If the first symbol is a non-terminal then it is rewritten using a rule
from the grammar.
Example:
New state by applying VP V is ((V) 3) or
by applying V V NP is ((V NP) 3)
Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Backtracking Guarantee for finding a parse: Systematically explore every possible
new state.
Backtracking is simple technique for exploring every possible new
state.
In backtracking all possible new states all generated from the current
state.
One of these is picked up as the next state and the rest are saved as
backup states.
If the current state can not lead to a solution, a new current state is
picked up from the list of backup states.
Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
A Top-down Parsing Algorithm (1/2) Possibilities list: a list of possible states.
current state backup states
Example: (((N) 2) ((NAME) 1) ((ADJ N) 1))
Current state: The first element of possibilities list in form of symbol
list
Backup states: Remaining elements of search states. An alternate
symbol list-word position pair.
Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
A Top-down Parsing Algorithm (2/2) 1- Select the current state: Take the first state off possibilities list list and
call it C. If the possibilities list is empty, then the algorithm fails.
2- If C consists of an empty symbol list and the word position at the end
of the sentence, then the algorithm succeeds.
3- Otherwise, generate the next possible states.
3.1 If the first symbol is a lexical symbol and the next word can
belong that lexical category, then update the state by removing
the first symbol and position number and add it to the possibilities
list.
3.2 Otherwise, If the first symbol on the symbol list of C is a
non-terminal then it is rewritten for each rule of the grammar
that can rewrite that non-terminal symbol and add them all to the
possibilities list.
Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
An Example Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381. S NP VP 4. VP V
NP ART N 5. VP V NP
NP ART ADJ N]
Sentence: The dogs cried
Step Current State Backup State
1. ((S) 1)
2. ((NP VP) 1)
3. ((ART N VP) 1)
((ART ADJ N VP) 1)
4. ((N VP)) 2)
((ART ADJ N VP) 1)
5. ((VP) 3)
((ART ADJ N VP) 1
((V) 3)
((V NP) 3)
V is matched with cried ((ART ADJ N VP) 1
An Example (Contd.) Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Parsing as Search Procedure We can think parsing as a special case of a search problem.
A top-down parser can be described in terms of the following generalized
search procedure. The possibilities list is initially set to the start state
of the parse. Then you repeat the following steps until you have success
or failure:
1. Select the first state from the possibilities list (and remove it from the list).
2. Generate the new states by trying every possible option from the selected state
(there may be none if we are on a bad path).
3. Add the states generated in step 2 to the possibilities list.
Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Parse Strategies
In a depth-first strategy, the possibilities list is a stack. In other words,
step 1 always takes the first element off the list, and step 3 always puts
the new states on the front of the list, yielding a last-in first-out (LIFO)
strategy.
In a breadth-first strategy the possibilities list is manipulated
as a queue. Step 3 adds the new positions onto the end of the list, rather
than the beginning, yielding a first-in first-out (FIFO) strategy.
Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Search Tree for Two Parse Strategies Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Difference Between Parse Strategies With the depth-first strategy, oneinterpretation is considered and expanded
until it fails; only then is thesecond one considered.
With the breadth-first strategy, both interpretationsare considered alternately,
each being expanded one step at a time.
Both depth-first and breadth-first searches found the solution but searched
the space in a different order.
A depth-first search often moves quickly to a solution but in other cases
may spend considerable time pursuing futile paths.
The breadth-first strategy explores each possible solution to a certain
depth before moving on.
Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Bottom-Up Parsing The main difference between top-down and bottom-up parsers is the way the
grammar rules are used.
For example, consider the rule
NP -> ART ADJ N
In a bottom-up parser you use the rule to take a sequence ART ADJ N that
you have found and identify it as an NP.
The basic operation in bottom-up parsing then is to take a sequence of
symbols and match it to the right-hand side of the rules.
Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Bottom-Up Parser You could build a bottom-up parser simply by formulating matching
process as a search process.
The state would simply consist of a symbol list, starting with the words in
the sentence. Successor states could be generated by exploring all
possible ways to
rewrite a word by its possible lexical categories
replace a sequence of symbols that matches the right-hand side of a
grammar rule by its left-hand side symbol
Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Motivation for Chart Parsers
Such a simple implementation would be prohibitively expensive, as the
parser would tend to try the same matches again and again, thus duplicating
much of its work unnecessarily.
To avoid this problem, a data structure called a chart is introduced that
Allows the parser to store the partial results of the matching it has done so
far so that the work need not be reduplicated.
Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Transition Network Grammars
Context-free rewrite rules as a formalism for representing grammars.
Transition network grammars are another formalism for representing
grammars.
This formalism is based on the notion of a transition network consisting
of nodes and labeled arcs.
Recursive Transition Network (RTN)
Augmented Transition Network (ATN)
Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.
Comments