List ML algorithm categories.
Examples of supervised learning
Examples of unsupervised learning
Hypothesis (model) and cost function for linear regression with a single variable
hθ(x) = θ0 + θ1*x
J(θ) = 1/(2m) * Σ{i=1~m} (hθ(x(i)) - y(i))2
How to find the parameter set for a linear regression problem?
Find a parameter set that minimizes the cost function, i.e.,
minθJ(θ)
One way of solving this optimization problem is the gradient descent algorithm.
Describe the gradient descent algorithm.
repeat until convergence {
for all j’s (simultaneously)
for i=1 to m {
θj := θj - a* (d/dθj J(θ))
}
}
}
a: learning parameter - Note that all ‘thetaj’ are updated simultaneously.
Discuss the learning rate of the gradient descent algorithm.
a: too small –> converges too slow a: too big –> might fail to converge or even diverge
Gradient descent algorithm for a linear regression with a single variable.
repeat until convergence {
for i=1 to m {
θ0 := θ0 - a* (h(x(i)-y(i))
θ1 := θ1 - a* (h(x(i)-y(i))*x(i)
}
}
* Note: This is a batch gradient descent.
What is “batch” gradient descent?
Each step of the gradient descent uses all the training samples.
Hypothesis and cost function of a linear regression with multi-variables.
hθ(X) = θT•X
J(θ) = 1/(2m) * Σ{i=1~m} (hθ(X(i)) - y(i))2
Gradient descent of a linear regression with multi-variables.
repeat until convergence {
for all j in {0, 1, …, n}
for i=1 to m
θj := θ - a* (hθ(X(i))-y(i))
}
Feature scaling and GD
For GD to work well, features must have a similar scale. Mean normalization can be used. X := (X - mu)/S - mu: mean vector - S: std or (max-min)
How do you make sure GD is working?
Plot the J(θ) as the number of iteration and see if it decreases at each iteration.
How to extend a linear regression to Polynomial regression for non-linear function?
Create new features from the existing ones.
For example,
x1 = x1
x2 = x12
x3 = x13
Then, solve the new feature sets using the linear regression technique.
Normal equation for linear regression.
θ=(XT•X)-1•XT•y
Explain Logistic Regression
In solving a {0, 1} classification problem, we want the hypothesis (model) function value to be in [0 1] range.
Decision boundary for logistic regression
Suppose
Then the decision boundary is θT•X=0.
Cost function for Logistic Regression
J(θ)=1/m*Σ{i=1~m} cost(hθ(X(i)), y(i))
where cost(hθ(X(i)), y(i)) is
If you combine the above two terms, then
cost(•) = -y*log(hθ(X(i))) -(1-y)*log(1-hθ(X(i)))
After training, your found your ML algorithm produce high prediction error with test data. What can you do?