Basic Pattern Definitions

• Model: is anything that has specific descriptions in mathematical form.
• Preprocessing: operation of simplifying subsequent operations without losing relative information.
• Segmentation: operation of isolating image objects from background.
• Feature space: space containing all pattern features.
• Feature vector is a vector from feature space that contains the actual values of current model.
• Feature Extraction: operation of reducing the data by measuring certain features or properties.
• Training Samples: samples used to extract some information about domain of problem.
• Decision Theory: theory of making a decision rule in order to minimize cost.
• Decision Boundary: a boundary that distinguishes classes decision regions.
• Novel Patterns: patterns that were not included in the training samples.
• Generalization: classifier ability to classify novel patterns in right class.
• Analysis by Synthesis: technique used to resolve problem of insufficient training data by having a model of how each pattern was produced.
• Image Processing: techniques to process images for enhancements and other purposes.
• Associative Memory: technique used to discover representative features of a group of patterns.
• Regression: area of finding some functional description of data in order to predict new value.
• Interpolation: area of inferring the function for intermediate ranges of input.
• Density Estimation: the problem of estimating the density (probability) that a member of a certain category will be found to have particular features.
• Pattern Recognition System:

• Mereology: is the study of part/whole relationships in order to make the classifier categorize inputs as “make sense” rather than matching but not too much.
• Invariant (distinguishing) Feature: features that are invariant to irrelevant transformations of the input
• Occlusion: concept of hiding part of an object by other part
• Feature Selection: operation of selecting most valuable features from a larger set of candidate features
• Noise: any property of the sensed pattern which is not duo to the underlying model but instead to randomness in the world or the sensors
• Error Rate: percentage of new patterns that are assigned to the wrong category
• Risk (Conditional Risk): total expected cost
• Evidence pooling, idea about having several classifiers used to categorize one sample. Here how can we resolve the problem of disagreeing classifying?
• Multiple Classifier: multiple classifiers working on different aspects of the input
• Design of Classification System:

• Overfitting: situation where the classifier is tuned on training samples and is unable to classify novel pattern correctly.
• Supervised Learning: learning technique where a teacher provides a category label or cost for each pattern in training set.
• Unsupervised Learning (Clustering): learning technique where the system itself makes “natural groupings” of input patterns.
• Reinforcement Learning: Calculating a tentative value for each category label to improve classification.

Chapter # 2

• State of Nature: class that the current sample belongs to.
• Prior: probability that the next sample will belongs to a specific class.
• Decision Rule: rule that decides for which class the current sample belongs to.
• Posterior: the probability of the state if nature being belongs to a specific class given that feature value x has been measured.
• Evidence Factor: scaling factor that guarantees that the posterior probabilities sum to one
• Loss Function: function that states how costly each action is, and is used to convert a probability determination into a decision.
• Bayes Risk: minimum overall risk
• Zero-One (symmetrical) Loss: loss function where each action is assigned to its category (i=j)
• Decision Region: region for each class on the histogram
• Dichotomizer: classifier that places a pattern in one of only two categories
• Polychtomizer: classifier that places a pattern in more than two categories
• Linear Discriminate Function: a linear function that is able to discriminate classes on the histogram
• Template Matching: assigning x to the nearest mean

Introduction to Pattern Recognition

• Model is anything that have specific descriptions in mathematical form
• Segmentation is an operation to isolate objects from each other in images
• If we think in business wise, you can ignore errors that are acceptable by customers (Salmon, Sea-Bass Example)
• Decision theory is concerned with which pattern classification is perhaps the most important subfield
• Feature vector is a vector from feature space that contains the actual values of current model
• When selecting features in feature vector don’t choose redundant features
• Though, our satisfaction would be premature because the central aim of designing a classifier is to suggest actions when presented with novel patterns. This is the issue of generalization
• Sometimes classifier is tuned of the training samples, here one solution is to support the classifier with more training samples for obtaining a better estimate
• Complex classifiers could be simplified if the process doesn’t require this level of complexity
• Assuming that we somehow manage to optimize this trade off, can we then predict how well our system will generalize to new patterns? These are some of the central problems in statistical pattern recognition
• (“Entities are not to be multiplied without necessity”). Decisions based on overly complex models often lead to lower accuracy of the classifier
• There is no GPS (Simpson, Newell device). We conclude here that there’s no GPS
• Each recognition technique is suitable for a specific domain of problem:
• Patterns that have statistical properties are more suitable using statistical pattern recognition
• Patterns with noise are more suitable for neural network pattern recognition
• If the model consists of some set of crisp logical rules, then we employ the methods of syntactic pattern recognition
• A central aspect in PR problem is to achieve good representation
• Features of each sample are close to its category
• Simplify number of features
• Robust features: relatively insensitive to noise or other errors
• In practical applications we may need the classifier to act quickly, or use few electronic components, memory or processing steps
• A central technique, when we have insufficient training data, is to incorporate knowledge of the problem domain. Indeed the less the training data the more important is such knowledge (analysis by synthesis)
• What about classifying patterns on depending on their functions (chair example)
• In acts of associative memory, the system takes in a pattern and emits another pattern which is representative of a general group of patterns.

Sub Problems of Pattern Recognition

• an ideal feature extractor would yield a representation that makes the job ofthe classifier trivial
• How do we know which features are most promising? Are there ways to automatically learn which features are best for the classifier? How many shall we use?
• We define noise very general terms: any property of the sensed pattern due not to the true underlying model but instead to randomness in the world or the sensors
• While an overly complex model may allow perfect classification of the training samples, it is unlikely to give good classification of novel patterns — a situation known as overfitting
• One ofthe most important areas ofresearch in statistical pattern classification is determining how to adjust the complexity ofthe model — not so simple that it cannot explain the differences between the categories, yet not so complex as to give poor classification on novel patterns
• model selection is concerned with how are we to know to reject a class of models and try another one
• How can we make model selection automated?
• Prior knowledge is pre knowledge about the problem helps is classification
• Occlusion!
• How should we train a classifier or use one when some features are missing?
• This is the problem of subsets and supersets — formally part of mereology, the study of part/whole relationships. It is closely related to that of prior knowledge and segmentation. In short, how do we recognize or group together the “proper” number of elements — neither too few nor too many
• In invariation, Thus here we try to build a classifier that is invariant to transformations such as rotation
• How might we insure that our pattern recognizer is invariant to such complex changes?
• Evidence pooling, idea about having several classifiers used to categorize one sample. Here how can we resolve the problem of disagreeing classifying?
• How should a “super” classifier pool the evidence from the component recognizers to achieve the best decision?
• Costs and Risks
• How do we incorporate knowledge about such risks and how will they affect our classification decision?
• Can we estimate the total risk and thus tell whether our classifier is acceptable even before we field it?
• Computations Complexity:
• What is the tradeoff between computational ease and performance?
• How can we optimize within such constraints?
• Throughout this book, we shall see again and again how methods of learning relate to these central problems, and are essential in the building of classifiers.

• Learning refers to some form of algorithm for reducing the error on a set of training data
• Learning Forms:
• Supervised Learning:
• How can we be sure that a particular learning algorithm is powerful enough to learn the solution to a given problem and that it will be stable to parameter variations?
• How can we insure that the learning algorithm appropriately favors “simple” solutions rather than complicated ones
• Unsupervised Learning (Clustering):
• System forms clusters or “natural groupings” of the input patterns
• Reinforcement Learning (learning with a critic):
• This is analogous to a critic who merely states that something is right or wrong, but does not say specifically how it is wrong
• Team reading: Summary of chapters
• Questions:
• What’s Hypothesis testing