Why is optimization important?
NN objectives are highly non-convex (and worse with depth)
Optimization has huge influence on quality of model
- Factors: Convergence speed, convergence quality
Standard training method is…
Stochastic Gradient (SG)
Explain the stochastic gradient training method
Computing gradient is known as ‘backpropagation’
Backpropagation computes neural network gradient via which rule?
Chain rule
How can we speed up the training process of deep neural networks?
Faster Optimizers!
Four ways:
Another way: Use faster optimizer than regular Gradient Descent optimizer, such as:
What are the general ideas of the momentum optimizer?
Primary idea:
In deep neural networks, if you cannot use Batch Normalization, you can apply…
Momentum optimization!
True or False: Momentum Optimization uses gradient for acceleration, not for speed
True
True or False: Momentum can be a hyperparameter of Stochastic Gradient Boosting
True
Hyperparameters regulates the design of a model.
Name some hyperparameters in different machine learning models
Machine Learning algorithms have different hyperparameters(Some main, some optional):
Why is a good learning rate important?
If learning rate is too high -> Training may diverge
If it is set too low -> Training will eventually converge to optimum, at cost of very long time
How can you fit a good learning rate?
You can fit a good learning rate by:
-> There’s a good graph in the slides on this
There are different strategies to reduce learning rate during training. These are called Learning schedules.
Name a few of them
What are the challenges of Hyper-parameter optimization?
Summary = Resource intensive, configuration space is complex (loads of different variables to tweak), can’t optimize for generalization performance
What are the model-free blackbox optimization methods?
How does Grid Search work?
All combinations of selected values of hyperparameters are tested to determine the optimal choice
Compute heavy - But very easy to implement
How can we limit the downsides of the heavy requirements in Grid Search computation?
By only searching in the logarithmic space (dunno, maybe you understand this Diana)
Logarithms of hyperparameters are sampled uniformly rather than hyperparameters themselves
Example: Instead of sampling learning rate (alpha) between 0.1 and 0.001, first sample log(a) uniformly between -1 and -3 and then exponentiate it to power of 10
What are some other Hyper-parameter optimization techniques?
Are Deep Generative Models used for supervised or unsupervised learning?
Unsupervised.
Deep Generative Models are an efficient way to analyse and understand unlabeled data
What are some examples of use cases for Deep Generative Models?
Visual Recognition, Speech recognition and generation, natural language processing
Deep Generative Models can be divided into two main categories - These are…
Cost function based models - Such as autoencoders and generative adversarial networks
Energy-based models where joint probability is defined using an energy function
What are Boltzmann Machines(BMs)?
BMs are fully connected popular ANN architecture
What are the differences between BMs and Restricted Boltzmann Machines(RBMs)?
As their name implies, RBMs are a variant of Boltzmann machines, with the restriction that their neurons must form a bipartite graph: a pair of nodes from each of the two groups of units (commonly referred to as the “visible” and “hidden” units respectively) may have a symmetric connection between them; and there are no connections between nodes within a group. By contrast, “unrestricted” Boltzmann machines may have connections between hidden units. This restriction allows for more efficient training algorithms than are available for the general class of Boltzmann machines, in particular the gradient-based contrastive divergence algorithm.
Essentially, there are different variations of The Boltzmann Machines. What are the variations
Boltzmann Machines
Summary: Whenever “Deep” is involves, it just means that there are more layers