Skip to content
Machine Learning

Maximum Likelihood and Loss Functions

Account required to view full content

Every model you train minimizes a loss function, but where does that loss come from? In a quant interview the sharp follow-up is almost always the same: “why squared error for regression?” or “why cross-entropy for classification?” The honest, impressive answer is that these losses are not arbitrary choices. Most of them fall straight out of one principle: maximum likelihood estimation (MLE). If you can derive squared error from Gaussian noise and cross-entropy from Bernoulli labels on the whiteboard, you show that you understand what your model is actually assuming about the data. This lesson builds that bridge. We keep the distribution theory itself lean and lean on the Probability course, then focus on the MLE-to-loss connection that interviewers probe.