Chapter 2, Statistical Learning

What is Statistical Learning

Input variables to a statistical model are often called predictors, independent variablesfeatures, or just variables.

Output variable from a statistical model is often called response, or dependent variable.
In essence, statistical learning refers to a set of approaches for estimating f (error term)

Y=f(X)+ε

Why Estimate f

Estimating f is used for prediction and inference.

The prediction accuracy depends on two quantities, reducible error and irreducible error.

The reducible error measure the prediction error as function of X and predicted Y, which by using appropriate statistical learning technique can be reduced.

The irreducible error measure the prediction error as function of ε and the predicted Y. Its irreducible as no matter how well we estimate Y, we cannot reduce the error introduced by ε. The irreducible error may contain unknown variables, variations and variance of the estimated output Y.

The techniques of statistical learning focus on estimating of f with aim of minimizing the reducible error.

The inference task is about understanding the relations between the output and the predictors. Some questions to ask are

  • Which predictors have strong influence on Y
  • What is the relationship between the response and each predictor
  • Is the relation between Y and predictors representable in linear equation or not

Linear models allow for relatively simple and interpretable inference, but may not yield as accurate predictions as some other approaches

Non-Linear models provide quite accurate predictions for Y , but this comes at the expense of a less interpretable model for which inference is more challenging

How F is estimated

The process here is to observe training data and train a statistical model to find Y ≈ f(X). There are two methods to do so parametric and nonparametric.

Parametric methods involve two step model based approach

  • Define and select a linear model f(X)=WX+b
  • Select a procedure for fitting the train data to the selected model. Example of such procedure is least squares method.

Choosing close model to the true function of f is challenging. Choosing a too far model would result in poor estimation.

Overfitting is a phenomena when the model fits much to the noise in the training data instead of generalizing to capture the underlying f.

Nonparametric methods do not make explicit assumptions about the functional form of f. Instead they seek an estimate close to the training data without being too rough or too wiggly. Advantage of such methods is ability to cover wide range of shapes of f, though this requires training data way more than parametric methods to generalize.

Usually restrictive, simple and linear models are more suitable than flexible models for inference tasks.

Regression problems are about predicting an quantitative response while the classification problems are about predicting a qualitative response.

Model Accuracy Assessment

In regression problems mean square error is commonly used.

screen-shot-2017-01-30-at-7-12-33-pm
Mean Square Error

 

Chapter 1, Introduction

Statistical learning refers to a vast set of tools for understanding data. Types of statistical learning:

  • Supervised statistical learning involves building a statistical model for predicting, or estimating, an output based on one or more inputs
  • Unsupervised statistical learning, there are inputs but no supervising output; nevertheless we can learn relationships and structure from such data
Linear regression is used for predicting quantitative values, such as an individual’s salary
Statistical Learning History:
  • Around 1900, Legendre and Gauss published method of least squares papers (a form of linear regression)
  • In 1936, Fisher proposed linear discriminant analysis for predicting qualitative regression problems.
  • In 1940, various authors proposed logistic regression for solving the regression problem
  • In the early 1970s, Nelder and Wedderburn coined the term generalized linear models for an entire class of statistical learning methods that include both linear and logistic regression as special cases.
  • In 1970, many linear models were available however, fitting a nonlinear relationship was computationally infeasible at that time.
  • In 1980, the computing technology had improved so it can solve nonlinear problems.
  • In mid 1980, Breiman, Friedman, Olshen and Stone introduced classification and regression trees, and were among the first to demonstrate the power of a detailed practical implementation of a method, including cross-validation for model selection
  • In 1986, Hastie and Tibshirani coined the term generalized additive models for a class of nonlinear extensions to generalized linear models
  • In the past years, progress on statistical learning techniques and tools has been marked by the increasing availability of powerful and relatively user-friendly software