Introduction to Lexical Analyzer

  • Lexical Analyzer:


  • Scanning consists of the simple processes that do not require tokenization of the input, such as deletion of comments and compaction of consecutive whitespace characters into one
  • Token is a pair consisting of a token name (boldface) and an optional attribute value.
    • The token name is an abstract symbol representing a kind of lexical unit (keyword or identifier…)
  • Pattern is a description of the form that the lexemes of a token may take
  • lexeme is a sequence of characters in the source program that matches the pattern for a token
  • Token Classes:
    • Keyword
    • Comparison
    • Identifiers
    • Constants
    • Punctuation Symbol
  • The simplest recovery strategy is “panic mode” recovery. We delete successive characters from the remaining input, until the lexical analyzer can find a well-formed token at the beginning of what input is left
  • Error Recovery Techniques:
    • Delete one character from the remaining input
    • Insert a missing character into the remaining input
    • Replace a character by another character
    • Transpose two adjacent characters
  • Input Buffering:
    • It’s the process of reading source code file
    • 2 Pointers to the input maintained:
      • Pointer LexemeBegin: marks the beginning of current lexeme
      • Pointer Forward: scans until a pattern matched is found
    • Sentinel Char:
      • Special character that marks the end of source program (i.e. eof)
  • Alphabet: any finite set of symbols as letters, digits and punctuation
  • String
    Over Alphabet: finite sequence of symbols drawn from that alphabet
  • Language: any countable set of string over some fixed alphabet
  • Regular Expressions Precedence:
    • *, concatenation, |
  • Extensions for Regular Expressions: ?, +, Character Classes []

Introduction to Compilers Theory

  • Compiler is a program that can read a program in one language – the source language – and translate it into an equivalent program in another language – the target language
  • Interpreter directly executes source code operations on input supplied by user
  • Compiler is faster but interpreter give better error diagnostics
  • Hybrid Compiler:


  • Developing EXE File:
  • Source program ->            -> modified source program ->      -> Target Assembly Program ->          -> Relocatable Machine Code ->          -> Target Machine Code
  • Structure of Compiler:
    • Analysis Part (Front End):
      • Break up source program into consistent pieces
      • Then, imposes a grammatical structure on them
      • After that, it uses this structure to create an intermediate representation of the source program
      • Here also we create symbol table
    • Synthesis Part (Back End):
      • Constructs the desired target program from the intermediate representation and the information in the symbol table
  • Compiler Phases:


  • Lexical Analysis:
    • The lexical analyzer reads the stream of characters making up the source program and groups the characters into meaningful sequences called lexemes
    • For each lexeme, the lexical analyzer produces as output a token of the form <Token Name-Value>
  • Syntax Analyzer (Parser):
    • Creates a tree-like intermediate representation that depicts the grammatical structure of the token stream
  • Semantic Analyzer:
    • Check the source program for semantic consistency with the language definition
      • Example: operator’s type checking
  • Intermediate Code Generator:
    • Generates an explicit low-level or machine-like intermediate representation, which we can think of as a program for an abstract machine
    • Properties:
      • Easy to produce
      • Easy to translate into the target machine
  • Code Optimization:


  • Compiler Construction Tools:
    • Parser Generator: requires grammatical description of the language
    • Scanner Generator: requires RE description of the tokens of the language
    • Syntax Directed Translation Engine:
      • Produce collections of routines for walking a parse tree and generating intermediate code
    • Code Generator Generators:
      • Produce a code generator from a collection of rules for translating each operation of the intermediate language into the machine language for a target machine
    • Data-Flow Analysis Engines:
      • Facilitates the gathering of information about how values are transmitted from one part of a program to each other part. Data-flow analysis is a key part of code optimization
    • Compiler-Construction Toolkits:
      • Provides an integrated set of routines for constructing various phases of a compiler
  • Finite-state machines and regular expressions models useful for describing the lexical units of programs (keywords, identifiers, ..) and for describing the algorithms used by the compiler to recognize those units
  • Context-free grammars, used to describe the syntactic structure of programming languages such as the nesting of parentheses or control constructs
  • Trees for representing the structure of programs and their translation into object code
  • Compiler optimizations must meet the following design objectives:
    • The optimization must be correct, that is, preserve the meaning of the compiled program
    • The optimization must improve the performance
    • The compilation time must be kept reasonable
    • The engineering effort required must be manageable
  • Data-flow optimizations, has been developed to analyze the flow of data through the program and removes redundancies across these constructs
  • Optimizations for Computer Architectures:
    • Parallelism
      • Hardware scheduler can change the instruction ordering to increase the parallelism in the program
      • Try to use the microprocessor mechanisms of parallelism (as processors able to work on vectors in parallel way)
    • Memory Hierarchies:
      • It has been found that cache-management policies implemented by hardware are not effective in some cases, especially in scientific code that has large data structures
      • It is possible to improve the effectiveness of the memory hierarchy
        • Changing the layout of the data
        • Change the layout of code
        • Changing the order of instructions accessing the data
  • Program Translations:
    • Binary Translation
    • Hardware Synthesis
    • Database Query Interpreters
    • Compiled Simulation

Basic Pattern Definitions

  • Model: is anything that has specific descriptions in mathematical form.
  • Preprocessing: operation of simplifying subsequent operations without losing relative information.
  • Segmentation: operation of isolating image objects from background.
  • Feature space: space containing all pattern features.
  • Feature vector is a vector from feature space that contains the actual values of current model.
  • Feature Extraction: operation of reducing the data by measuring certain features or properties.
  • Training Samples: samples used to extract some information about domain of problem.
  • Decision Theory: theory of making a decision rule in order to minimize cost.
  • Decision Boundary: a boundary that distinguishes classes decision regions.
  • Novel Patterns: patterns that were not included in the training samples.
  • Generalization: classifier ability to classify novel patterns in right class.
  • Analysis by Synthesis: technique used to resolve problem of insufficient training data by having a model of how each pattern was produced.
  • Image Processing: techniques to process images for enhancements and other purposes.
  • Associative Memory: technique used to discover representative features of a group of patterns.
  • Regression: area of finding some functional description of data in order to predict new value.
  • Interpolation: area of inferring the function for intermediate ranges of input.
  • Density Estimation: the problem of estimating the density (probability) that a member of a certain category will be found to have particular features.
  • Pattern Recognition System:



  • Mereology: is the study of part/whole relationships in order to make the classifier categorize inputs as “make sense” rather than matching but not too much.
  • Invariant (distinguishing) Feature: features that are invariant to irrelevant transformations of the input
  • Occlusion: concept of hiding part of an object by other part
  • Feature Selection: operation of selecting most valuable features from a larger set of candidate features
  • Noise: any property of the sensed pattern which is not duo to the underlying model but instead to randomness in the world or the sensors
  • Error Rate: percentage of new patterns that are assigned to the wrong category
  • Risk (Conditional Risk): total expected cost
  • Evidence pooling, idea about having several classifiers used to categorize one sample. Here how can we resolve the problem of disagreeing classifying?
  • Multiple Classifier: multiple classifiers working on different aspects of the input
  • Design of Classification System:


  • Overfitting: situation where the classifier is tuned on training samples and is unable to classify novel pattern correctly.
  • Supervised Learning: learning technique where a teacher provides a category label or cost for each pattern in training set.
  • Unsupervised Learning (Clustering): learning technique where the system itself makes “natural groupings” of input patterns.
  • Reinforcement Learning: Calculating a tentative value for each category label to improve classification.

 

Chapter # 2

  • State of Nature: class that the current sample belongs to.
  • Prior: probability that the next sample will belongs to a specific class.
  • Decision Rule: rule that decides for which class the current sample belongs to.
  • Posterior: the probability of the state if nature being belongs to a specific class given that feature value x has been measured.
  • Evidence Factor: scaling factor that guarantees that the posterior probabilities sum to one
  • Loss Function: function that states how costly each action is, and is used to convert a probability determination into a decision.
  • Bayes Risk: minimum overall risk
  • Zero-One (symmetrical) Loss: loss function where each action is assigned to its category (i=j)
  • Decision Region: region for each class on the histogram
  • Dichotomizer: classifier that places a pattern in one of only two categories
  • Polychtomizer: classifier that places a pattern in more than two categories
  • Linear Discriminate Function: a linear function that is able to discriminate classes on the histogram
  • Template Matching: assigning x to the nearest mean

Introduction to Communication Network and Internet

  • Data communication deals with the transmission of signals in a reliable and efficient manner.
  • Data Communication Study Area:
    • Signal transmission
    • Transmission media
    • Signal encoding
    • Interfacing
    • Data link control
    • Multiplexing

Data Communication and Networking for Today’s Enterprise

  • There are three different forces that driven the architecture and evolution of data communication:
    • Traffic growth: seeking for maximizing transmission capacity and minimizing costs
    • Development of new services
    • Advances in technology: advanced in technology that should support growth of traffic an services
  • Technology Trends in Data Communication and Networking:
    • Faster and cheaper in computing and communication
    • Voice-Oriented telecommunications networks (such as PSTN and data networks) are more intelligent
      • The intelligence appears in two areas:
        • Offers of different QoS. As specifications for maximum delay, minimum throughput…
        • Supports variety of customization services in the areas of network management and security
    • Internet, web and associated applications have emerged as dominant features of business and personal world
    • Increasing mobility for librating workers. Which results in voice mail, remote data access …
  • Requirements for High-Speed LANs:
    • Centralized Server Farms: ability to centralize data for multiuser access
    • Power Workgroup: ability to send large amounts of data
    • High-Speed Backbone: ability for high transferring speed
  • Fundamental purpose of a communication system is the exchange of data between two parties

Simple Model of Communication

  • Key Tasks in Data Communication System:
    • Utilization of Transmission System:
      • Need for efficient utilization of transmission facilities shared among communication devices
      • Used Techniques:
        • Multiplexing:
          • Techniques
            used to allocate the total capacity of a transmission medium among several users
        • Congestion Control:
          • Techniques used to assure that the system is not overwhelmed by excessive demands for transmission services
    • Interface; point of connection between devices:
      • Mechanical Characteristics: Shape, # of pins …
      • Electrical Characteristics: values of volts …
      • Functional Characteristics: function (task) of the pin
      • Procedural Characteristics: the set and sequence of function to do a specific task (i.e. printing)
    • Signal Generation:
      • Responsible for generating signals. Here are signal characteristics:
        • Signal should be capable of being propagated through transmission system
        • Signals should be interpretable as data at the receiver
    • Synchronization:
      • Receiver must be able to determine the beginning and duration of each signal element
    • Exchange Management
      • If data are to be exchanged in both directions over a period of time, the two parties must cooperate
    • Error Detection and Correction:
      • Duo to non-perfect medium, a signal may be destroyed during transmission, So there must be a mechanism of detecting and correcting destroyed signals
    • Flow Control:
      • A receiver has limited resources (as buffers), so there must be some source control on the transmission so that the receiver buffers are not flooded
    • Addressing and Routing:
      • Using protocols. A protocol is grading rules to do a specific task
      • Each device must be assigned a unique identifier in the network to enable data transmission
    • Recovery:
      • The network must be able to recover from any failure duo to overload or other cases. No loss of data should occur
    • Message Formatting:
      • There should be a pre-specific format for messages that are transmitted over networks that’s defined by the communication protocol
      • Pre-specific pattern of signals must be exist between transmitter and receiver
    • Security:
      • Sender may wish to assure that only intended receiver received. A received wish to ensure that the received data has not been altered in transit
    • Network Management:
      • How to configure systems.
      • How to react to failure.
      • How to plan for future scalability.
  • The building block of any communications facility is the transmission line.
  • The common communication mediums are fiber optic and wireless transmission.
  • Wireless transmission provides two important concepts:
    • Universal personal telecommunication:
      • Ability of a person to identify himself easily and to use conveniently any communication system in a large area in terms of a single account.
    • Universal Access to Communications:
      • Capability of using the same terminal in multiple environments to connect to information services
  • Transmission services remain the most costly component of a communication budget
  • There are 2 common techniques used to increase the efficiency of transmission services:
    • Multiplexing: ability of a number of devices to share a transmission facility
    • Compression: ability to squeeze data down in order to get lower capacity, cheaper transmission facility
  • Most commonly used transmission media are twisted-pair lines, coaxial cable, fiber cable, and terrestrial and satellite microwave.
  • Error rate depends on signal and medium type to be used.
  • Multiplexing Techniques:
    • Frequency division
    • Synchronous time division
    • Statistical time division

Networks

  • Integration means that the customer equipment and networks can deal simultaneously with voice, data, image and even video
  • Network Types: WAN, LAN and Wireless
  • WAN Implementation Techniques:
    • Circuit Switching:
      • Here a dedicated communication path is established between two stations through network nodes
      • Example: telephone network
    • Packet Switching:
      • There’s no dedicated network path. Data are sent out in sequence of small chunks called packets
      • This technique is commonly used for terminal-to-computer and computer-to-computer communications
    • Frame Relay:
      • Reducing size of packets because the advance in the communication technologies that leads to low error rate
      • Packet switching was designed for users with 64 kbps whereas frame relay designed for up to 2 Mbps
    • Asynchronous Transfer Mode (ATM) – Cell Relay Mode:
      • Frame relay uses variable frame lengths whereas ATM used fixed frame length. This fixed frame called cell
      • This technique is designed to work in range 10s and 100s of Mbps and in Gbps range
  • Differences between WAN and LAN:
    • LAN scope is smaller than WAN scope
    • Usually LAN is owned by an organization. This leads to:
      • Care when choosing LAN because there may be substation investment for purchasing and maintenance.
      • Network management responsibility for a LAN falls solely on the user
    • Internet data rates of LAN are much greater than WAN
  • LAN Types:
    • Switched LAN:
      • Switched Ethernet: consists of a single switch with a number of attached devices or number of interconnected switched
    • Wireless LAN
    • ATM LAN
    • Fiber Channel

The Internet

  • Purpose of the internet is to interconnect end systems called hosts
  • IP datagrams (IP packets) beaked data from source that will be sent over network
  • Central Office
    the place where telephone companies terminate customer lines and locate switching equipment to interconnect those lines with other networks
  • Customer Premises Equipment (CPE): telecommunications equipment that is located on the customers physical location rather than on the providers physical location or in between
  • Internet Service Provider (ISP): company that provides other companies or individuals with access to the internet
  • Network Access Point (NAP): technique that is used to tie all ISPs together
  • Network Service Provider(NSP): company that provides backbone service to ISP
  • Point of Presences (POP): site that has a collection of telecommunication equipment usually refers to ISP or telephone company sites. An ISP POP is the edge of the ISP’s network; connections from users are accepted and authenticated here