PSTAT 131 Lecture 2

class

tells the variable type

aggregation

multiple identical sensors (days to month, or sums/average)

sampling

selecting a subset of the data

feature creation

combine attributes to use

discretization

rounding, converting numeric to categories

Transformation

make a distribution symmetric of more Gaussian (log-transform)

scaling

transform features to mean 0 and variance 1

Survivorship bias

Concentrating on the people or things that "survived" some process and inadvertently overlooking those that didn't because of their lack of visibility. (ex plane engine. health studies )

Predictive Learning

Classification and Regression

Classification

classify an item as one of existing number of categories

Regression

find a function for which predicts continuous data

Descriptive

Clustering, Summarizing, Anomaly Detection

Clustering

identify distinct grouping of data

Summarizing

averages, associations, test statistics

Anomaly Detection

identify unusual items or data points

y = f(X) + ε

y - responseX - predictors ε - error term (measurement error or unexplainable variation from unmeasured X's)

Synonyms for output variable

- response-dependent variable -class label (categorical) -outcome

Synonyms for input variable

-predictors-independent variables-covariates-features-regressors

Supervised Learning 1

model the output as a function of the input train - use available input/output data to estimate the function test- predict new outputs, given only input

Unsupervised Learning

labels are unknown dimension reduction useful for visualization and exploratory analysis

Supervised Learning 2

Goal is to predict y Hats indicate estimated quantities ε = irreducible error bc not estimatable predict y by estimating f:y=f(X)

Bias Variance Trade Off

MSE(y) = Var(y)+Bias(y)^2(often evaluate an estimator in terms of MSE(y) = E(y-yhat)^2

Bias

Systematic error of the estimator -average difference between predictor and response E[yhat-y]

Variance

variance of the estimator (from sampling and measurement error) -comes from sampling of a population -how variable is the prediction about its mean E[yhat - E[yhat]]^2

High Bias and Low Variance

-more structured/parametric models -smaller sample size (data normally distributed)

Low bias and High Variance

-less structed/nonparametric models -larger sample size

Supervised Learning of fhat(x)

best model maximizes/minimizes the objective function or criteria - minimizing: also call cost function or loss function -can always find an fhat that fits the available data perfectly -generalizability (overfitting)

Overfitting

Not flexible enough = won't fit data well- high biasToo flexible = won't generalize to new data - overfitting -high variance

High Variance. Low Bias

0

High Bias. Low Variance

0