Skip to content
Machine Learning

Training Neural Networks (Regularization and Optimizers)

Account required to view full content

The previous lesson gave you a network, a loss, backpropagation for the gradients, and a plain gradient descent step. That is enough to train a tiny network, but real networks are deep, non-convex, and prone to overfitting, so plain SGD with a fixed learning rate is rarely good enough. This lesson covers the two practical problems of training: getting the optimizer to converge quickly and reliably, and getting the trained network to generalize rather than memorize. On the optimization side we build up momentum, RMSProp, and Adam, and work one Adam step by hand. On the generalization side we cover weight decay, dropout, and early stopping, which are the deep learning faces of regularization you already met in Module 1 and Module 2. Interviewers probe both, so know why each trick exists and what it costs.