- Lexical Analyzer:
- Scanning consists of the simple processes that do not require tokenization of the input, such as deletion of comments and compaction of consecutive whitespace characters into one
-
Token is a pair consisting of a token name (boldface) and an optional attribute value.
- The token name is an abstract symbol representing a kind of lexical unit (keyword or identifier…)
- The token name is an abstract symbol representing a kind of lexical unit (keyword or identifier…)
- Pattern is a description of the form that the lexemes of a token may take
- lexeme is a sequence of characters in the source program that matches the pattern for a token
-
Token Classes:
- Keyword
- Comparison
- Identifiers
- Constants
- Punctuation Symbol
- Keyword
- The simplest recovery strategy is “panic mode” recovery. We delete successive characters from the remaining input, until the lexical analyzer can find a well-formed token at the beginning of what input is left
-
Error Recovery Techniques:
- Delete one character from the remaining input
- Insert a missing character into the remaining input
- Replace a character by another character
- Transpose two adjacent characters
- Delete one character from the remaining input
-
Input Buffering:
- It’s the process of reading source code file
-
2 Pointers to the input maintained:
- Pointer LexemeBegin: marks the beginning of current lexeme
- Pointer Forward: scans until a pattern matched is found
- Pointer LexemeBegin: marks the beginning of current lexeme
-
Sentinel Char:
- Special character that marks the end of source program (i.e. eof)
- Special character that marks the end of source program (i.e. eof)
- It’s the process of reading source code file
- Alphabet: any finite set of symbols as letters, digits and punctuation
- String
Over Alphabet: finite sequence of symbols drawn from that alphabet
- Language: any countable set of string over some fixed alphabet
-
Regular Expressions Precedence:
- *, concatenation, |
- *, concatenation, |
- Extensions for Regular Expressions: ?, +, Character Classes []