Skip to content
Machine Learning

Cross-Validation for Financial Data (Purging and Embargoing)

Account required to view full content

This is the first lesson of the Financial Machine Learning module, and it fixes the single most common reason a quant’s backtest looks brilliant in research and dies in production: leakage through cross-validation. Standard k-fold cross-validation assumes the observations are independent. Financial labels are built over overlapping windows of time, so they are not, and an ordinary k-fold quietly lets the model see the answer. This lesson explains exactly why the leak happens, then builds the two fixes that the field now treats as standard: purging the training set of observations that overlap the test set, and adding an embargo to kill the leakage that serial correlation sneaks through. We work through which observations get dropped around a test fold by hand, count the backtest paths that combinatorial purged cross-validation produces, and contrast the whole approach with walk-forward testing. It is the foundation the rest of the module stands on: triple-barrier labels, feature importance, and the deflated Sharpe ratio all assume you validated without leaking.