# What is Statistical Learning

**Input variables** to a statistical model are often called **predictors**, **independent variables**, **features**, or just **variables**.

**Output variable** from a statistical model is often called **response**, or **dependent variable**.

In essence, **statistical learning** refers to a set of approaches for estimating `f`

(error term)

`Y=f(X)+ε`

# Why Estimate f

Estimating `f`

is used for **prediction** and **inference**.

The **prediction accuracy** depends on two quantities, **reducible error** and **irreducible error**.

The **reducible error** measure the prediction error as function of X and predicted Y, which by using appropriate statistical learning technique can be reduced.

The **irreducible error** measure the prediction error as function of ε and the predicted Y. Its irreducible as no matter how well we estimate Y, we cannot reduce the error introduced by ε. The irreducible error may contain unknown variables, variations and variance of the estimated output Y.

The techniques of statistical learning focus on estimating of `f`

with aim of minimizing the reducible error.

The inference task is about understanding the relations between the output and the predictors. Some questions to ask are

- Which predictors have strong influence on
`Y`

- What is the relationship between the response and each predictor
- Is the relation between
`Y`

and predictors representable in linear equation or not

**Linear models** allow for relatively simple and interpretable inference, but may not yield as accurate predictions as some other approaches

**Non-Linear** **models** provide quite accurate predictions for `Y`

, but this comes at the expense of a less interpretable model for which inference is more challenging

# How F is estimated

The process here is to observe **training data** and train a statistical model to find `Y ≈ f(X)`

. There are two methods to do so parametric and nonparametric.

**Parametric methods** involve two step model based approach

- Define and select a linear model
`f(`

**X**)=**WX+b** - Select a procedure for
**fitting**the train data to the selected model. Example of such procedure is**least squares**method.

Choosing close model to the true function of `f`

is challenging. Choosing a too far model would result in poor estimation.

**Overfitting** is a phenomena when the model fits much to the noise in the training data instead of generalizing to capture the underlying `f`

.

**Nonparametric methods** do not make explicit assumptions about the functional form of `f`

. Instead they seek an estimate close to the training data without being too rough or too wiggly. Advantage of such methods is ability to cover wide range of shapes of `f`

, though this requires training data way more than parametric methods to generalize.

Usually restrictive, simple and linear models are more suitable than flexible models for inference tasks.

**Regression problems** are about predicting an quantitative response while the **classification problems** are about predicting a qualitative response.

# Model Accuracy Assessment

In regression problems **mean square error** is commonly used.