Introduction to Compilers Theory

  • Compiler is a program that can read a program in one language – the source language – and translate it into an equivalent program in another language – the target language
  • Interpreter directly executes source code operations on input supplied by user
  • Compiler is faster but interpreter give better error diagnostics
  • Hybrid Compiler:


  • Developing EXE File:
  • Source program ->            -> modified source program ->      -> Target Assembly Program ->          -> Relocatable Machine Code ->          -> Target Machine Code
  • Structure of Compiler:
    • Analysis Part (Front End):
      • Break up source program into consistent pieces
      • Then, imposes a grammatical structure on them
      • After that, it uses this structure to create an intermediate representation of the source program
      • Here also we create symbol table
    • Synthesis Part (Back End):
      • Constructs the desired target program from the intermediate representation and the information in the symbol table
  • Compiler Phases:


  • Lexical Analysis:
    • The lexical analyzer reads the stream of characters making up the source program and groups the characters into meaningful sequences called lexemes
    • For each lexeme, the lexical analyzer produces as output a token of the form <Token Name-Value>
  • Syntax Analyzer (Parser):
    • Creates a tree-like intermediate representation that depicts the grammatical structure of the token stream
  • Semantic Analyzer:
    • Check the source program for semantic consistency with the language definition
      • Example: operator’s type checking
  • Intermediate Code Generator:
    • Generates an explicit low-level or machine-like intermediate representation, which we can think of as a program for an abstract machine
    • Properties:
      • Easy to produce
      • Easy to translate into the target machine
  • Code Optimization:


  • Compiler Construction Tools:
    • Parser Generator: requires grammatical description of the language
    • Scanner Generator: requires RE description of the tokens of the language
    • Syntax Directed Translation Engine:
      • Produce collections of routines for walking a parse tree and generating intermediate code
    • Code Generator Generators:
      • Produce a code generator from a collection of rules for translating each operation of the intermediate language into the machine language for a target machine
    • Data-Flow Analysis Engines:
      • Facilitates the gathering of information about how values are transmitted from one part of a program to each other part. Data-flow analysis is a key part of code optimization
    • Compiler-Construction Toolkits:
      • Provides an integrated set of routines for constructing various phases of a compiler
  • Finite-state machines and regular expressions models useful for describing the lexical units of programs (keywords, identifiers, ..) and for describing the algorithms used by the compiler to recognize those units
  • Context-free grammars, used to describe the syntactic structure of programming languages such as the nesting of parentheses or control constructs
  • Trees for representing the structure of programs and their translation into object code
  • Compiler optimizations must meet the following design objectives:
    • The optimization must be correct, that is, preserve the meaning of the compiled program
    • The optimization must improve the performance
    • The compilation time must be kept reasonable
    • The engineering effort required must be manageable
  • Data-flow optimizations, has been developed to analyze the flow of data through the program and removes redundancies across these constructs
  • Optimizations for Computer Architectures:
    • Parallelism
      • Hardware scheduler can change the instruction ordering to increase the parallelism in the program
      • Try to use the microprocessor mechanisms of parallelism (as processors able to work on vectors in parallel way)
    • Memory Hierarchies:
      • It has been found that cache-management policies implemented by hardware are not effective in some cases, especially in scientific code that has large data structures
      • It is possible to improve the effectiveness of the memory hierarchy
        • Changing the layout of the data
        • Change the layout of code
        • Changing the order of instructions accessing the data
  • Program Translations:
    • Binary Translation
    • Hardware Synthesis
    • Database Query Interpreters
    • Compiled Simulation
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s