Newest Viewed Downloaded

A Simple Lexical AnalyzerConvenient utility subprograms: getChar - gets the next character of input, puts it in nextChar, determines its class and puts the class in charClass addChar - puts the character from nextChar into the place the lexeme is being accumulated, lexeme lookup - determines whether the string in lexeme is a reserved word (returns a code)

Programming Language (程式語言) Chapter 4 Lexical and Syntax Analysis (詞彙與語法分析)

Topics

Introduction Lexical Analysis (詞彙分析) The Parsing Problem (剖析問題) Recursive-Descent Parsing (遞迴下降剖析) Bottom-Up Parsing (由下往上剖析)

Syntax Analyzers

Syntax analyzers, or parsers (剖析器), are nearly based on a formal description (正規化表示) of the syntax of programs The most commonly used syntax-description formalism is context-free grammars or BNF

Using BNF to Describe Syntax

Provides a clear and concise (清楚與簡潔) syntax description The parser can be based directly (直接) on the BNF Parsers based on BNF are easy to maintain (維護)

Syntax Analysis

The syntax analysis portion of a language processor nearly always consists of two parts: The lexical analyzer (詞彙分析器) deals with small-scale language constructs, such as names (名稱) and numeric literals (數字字母) The syntax analyzer (語法分析器) deal with the large-scale language constructs, such as expressions (表示式), statements (敘述), and program units (程式單元)

Reasons to Separate Lexical and Syntax Analysis (分別詞彙與語法分析的理由)

Simplicity (簡單) Less complex approaches can be used for lexical analysis; separating them simplifies (簡化) the parser Efficiency (效率) Separation allows optimization (最佳化) of the lexical analyzer Portability (可攜) Parts of the lexical analyzer is somewhat platform-dependent (平台依賴), but the syntax analyzer can be platform-independent (平台獨立)

Lexical Analysis

A lexical analyzer is a pattern matcher (樣式匹配器) for character strings A lexical analyzer is a “front-end” for the parser Identifies (識別) substrings of the source program that belong together - lexemes (詞彙) Lexemes match a character pattern, which is associated with a lexical category (詞彙的類型) called a token (符號、符記,字彙分析– 基本單位 ) value is a lexeme; its token may be IDENT

The Lexemes and Tokens

result = oldsum – value / 100; Lexemes (詞彙) result = oldsum - value / 100 ; Tokens (符號、符記) IDENT ASSIGN_OP IDENT SUBTRACT_OP IDENT DIVISION_OP INT_LIT SEMICOLON

Lexical Analysis

The lexical analyzer is usually a function that is called by the parser when it needs the next token Three approaches to building a lexical analyzer: Write a formal description (正規描述) of the tokens and use a software tool that automatically generates a lexical analyzer Design a state transition diagram (狀態轉變圖) that describes the token patterns (符號樣式) and write a program that implements the state diagram Design a state transition diagram that describes the token patterns and hand-construct a table-driven (表格驅動) implementation of the state diagram

Lexical Analysis

A lexical analyzer that recognizes only program names (名稱), reserved word (保留字), and integer literals (數字字母) Reserved words (保留字) and identifiers (識別符) can be recognized together (rather than having a part of the diagram for each reserved word) Use a table lookup to determine whether a possible identifier is in fact a reserved word

State Diagram

A state diagram to recognize names, reserved words, and integer literals

A Simple Lexical Analyzer

Implementation (by C language) int lex() { getChar(); switch (charClass) { case LETTER: addChar(); getChar(); while (charClass == LETTER || charClass == DIGIT) { addChar(); getChar(); } return lookup(lexeme); break; …

A Simple Lexical Analyzer

… case DIGIT: addChar(); getChar(); while (charClass == DIGIT) { addChar(); getChar(); } return INT_LIT; break; } /* End of switch */ } /* End of function lex */

A Simple Lexical Analyzer

Convenient utility subprograms: getChar - gets the next character of input, puts it in nextChar, determines its class and puts the class in charClass addChar - puts the character from nextChar into the place the lexeme is being accumulated, lexeme lookup - determines whether the string in lexeme is a reserved word (returns a code)

The Parsing Problem

Goals of the parser Check the input program to determine whether it is syntactically (在語句構造上) correct, and produce a diagnostic (診斷) message and recover (修復) when an error is found Produce either a complete parse tree, or least trace the structure of the complete parse tree, for syntactically correct input

Two Categories of Parsers

Top-down Produce the parse tree, beginning at the root Order is that of a leftmost derivation (最左邊衍生) Traces or builds the parse tree in preorder (前序) Bottom-up Produce the parse tree, beginning at the leaves Order is that of the reverse (相反) of a rightmost derivation (最右邊衍生) Parsers look only one token ahead (先前) in the input

Top-Down Parsers

Top-down Parsers Given a sentential form, xA , the parser must choose the correct A-rule to get the next sentential (句子的) form in the leftmost derivation, using only the first token produced by A The most common top-down parsing algorithms Recursive descent (遞迴下降) - a coded implementation (BNF) LL algorithms - table driven implementation

Bottom-Up Parsers

Bottom-up parsers Given a right sentential form, , determine what substring of  is the right-hand side (RHS) of the rule in the grammar that must be reduced to produce the previous sentential form in the right derivation The most common bottom-up parsing algorithms The LR family

The Complexity (複雜度) of Parsing

Parsers that work for any unambiguous (不含糊的, 清楚的) grammar are complex (複雜) and inefficient (無效率) ( O(n3), where n is the length of the input ) Compilers use parsers that only work for a subset of all unambiguous grammars, but do it in linear time ( O(n), where n is the length of the input )

Recursive-Descent (遞迴下降) Parsing

A recursive-descent parser has a subprogram for each nonterminal in the grammar, which can parse sentences that can be generated by that nonterminal EBNF is ideally (觀念上) suited (符合) for being the basis for a recursive-descent parser, because EBNF minimizes the number of nonterminals

Showing 1 - 20 of 43 items Details

Name: 
ch04_V1.3
Author: 
徐碩利
Company: 
N/A
Description: 
A Simple Lexical AnalyzerConvenient utility subprograms: getChar - gets the next character of input, puts it in nextChar, determines its class and puts the class in charClass addChar - puts the character from nextChar into the place the lexeme is being accumulated, lexeme lookup - determines whether the string in lexeme is a reserved word (returns a code)
Tags: 
the | term | parsing | factor | and | expr | lexical | token
Created: 
9/28/2006 4:35:44 PM
Slides: 
43
Views: 
2
Downloads: 
1
Rating: 
0


> Comment



Share this presentation
|

Comments

Share this presentation:

|
Sitemap