Training Neural Networks (Regularization And Optimizers)

The previous lesson gave you a network, a loss, backpropagation for the gradients, and a plain gradient descent step. That is enough to train a tiny network, but real networks are deep, non-convex, and prone to overfitting, so plain SGD with a fixed learning rate is rarely good enough. This lesson covers the two practical problems of training: getting the optimizer to converge quickly and reliably, and getting the trained network to generalize rather than memorize. On the optimization side we build up momentum, RMSProp, and Adam, and work one Adam step by hand. On the generalization side we cover weight decay, dropout, and early stopping, which are the deep learning faces of regularization you already met in Module 1 and Module 2. Interviewers probe both, so know why each trick exists and what it costs.

The Interview Guide for Quants and Traders

Training Neural Networks (Regularization and Optimizers)

Table of Contents

What do you think of this page?