Optimisation & Regularisation Flashcards

Question 1

Q

How can learning be viewed as Optimisation?

Answer

A

Training
Model fitting
Parameter estimation

Question 2

Q

How to decompose errors into bias and variance?

Answer

A

Error = bias^2 + variance + noise

Question 3

Q

What is bias?

Answer

A

How well our model can correctly predict the data

Question 4

Q

What is variance?

Answer

A

How well our model can respond to new data

Question 5

Q

How to reduce overfitting?

Answer

A

Need to dampen the complexity, smoothing it out.

Regularisation
- Restricting the degrees of freedom (effective number of parameters) present in our model
Will sacrifice training error for test error
e. g. in SVMs, the slack variables provide the regularisation

Question 6

Q

What is L1 Regularisation?

Answer

A

L1 weight regularisation penalises weight values by adding the sum of their absolute values to the error term

L1 regularisation encourages solutions where many parameters are zero

e.g. Lasso algorithm

Question 7

Q

What is L2 Regularisation?

Answer

A

L2 weight regularisation penalises weight values by adding the sum of their squared values to the error term

L2 regularisation encourages solutions where most parameter values are small.

e.g. Linear Regression

Question 8

Q

Batch vs Stochastic Gradient Descent

Answer

A

Batch: Evaluation of D occurs for entire data set at each iteration

Can be slow for large data sets
Cannot be used in incremental settings
Guaranteed to converge to the global minimum for convex error surfaces

Stochastic: Update is performed for each training instance

Order of training instances must be random
Updates are noisy, value of D will jump all over the place
“random walk” avoids getting stuck
Often only requires a small number of iterations through the full data set

Question 9

Q

How to find the minimum error for regularisation/optimisation?

Answer

A

Gradient Descent to approximate rather than calculate.

Question 10

Q

Why is it that parameter tuning might lead to overfitting?

Question 11

Q

What is the Gradient Descent method, and why is it important?

Answer

A

Gradient Descent is a mechanism for finding the minimum of a (convex) multivariate function where we can find its partial derivatives.

This is important because it allows us to determine the regression weights which minimise an error function over some training data set.

Optimisation & Regularisation Flashcards

(11 cards)