Hyperparameters Tuning and Batch Normalization - Class review

Last updated on:3 years ago

Once you have constructed your learning system, you have to tune its hyperparameters. The number of set of parameters means how many models you should train.

Tuning process

  • Hyperparameters: $\alpha, \beta, \varepsilon$.
  • Layers
  • Hidden units
  • Learning rate decay
  • Mini-batch size

Try random value

Don’t use a grid. Picking hyperparameters at random. Grid search can be used for the number of hidden units and layers.

Coarse to fine

Using an appropriate scale

Appropriate scale for hyperparameters

Exponentially weighted averages, don’t choose it in linear scale

It is more sensitive to change $\beta$ when it is close to 1.

$\beta = $ 0.9 0.999
10 1000
$1-\beta =$ 0.1 0.001

Hyperparameters tuning in practice: Pandas vs. Caviar

Intuition does get scale, re-evaluate occasionally.

  • Babysitting one model, Panda
  • Training many models in parallel, Caviar

Normalizing activations in a network

Normalizing inputs to speed up learning, batch norm

If $\gamma = \sqrt(\sigma^2 + \varepsilon)$
$\beta = \mu$

then, $\tilde{z}^{(i)} =z^{(i)}$

Why does batch norm work?

  • Learning on shifting input distribution

  • Batch norm: no matter how it changes, it eliminates the amount of updating parameters in the earlier layer that can affect the distribution of values.

  • The earlier layer doesn’t change much, has the same mean and variance

  • Reduce the problem of input value changing, this value becomes more stable

Batch norm at test time
$\mu, \sigma^2$: estimate using exponentially weighted average (across mini-batches)

Using data from training set.

Fitting batch norm into a neural network

Working with mini-batches

Mean subtraction will get rid of the constant.

Parameters:

Softmax

Multi-class classification
Softmax layer

Activation function

Training a softmax classifier
Hardmax $[1 0 0 0]^T$
Softmax, more gentle mapping
Softmax regression generalizes logistic regression to C classes.
Loss function

Cost function

Reference

[1] Deeplearning.ai, Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization