Skip to content
Machine Learning

Python for ML: NumPy and pandas

Account required to view full content

Every machine learning interview that involves a keyboard runs through NumPy and pandas. In a live coding round you will be asked to manipulate an array or a DataFrame under time pressure; in a take-home you will load a CSV, clean it, engineer features, and fit a model, and the first two libraries you reach for are these. This is the start of Module 6, the module where code becomes the primary teaching mode. Earlier modules taught the theory; here we build the muscle to implement it.

This lesson covers the two libraries that hold up almost all of Python data work: NumPy for fast numerical arrays and pandas for labeled, tabular, time-indexed data. The single idea that ties the lesson together is vectorization: expressing a computation as an operation on whole arrays rather than a Python loop over elements. Get that idea right and your code is shorter, clearer, and often a hundred times faster. We finish by computing rolling features on returns, the exact kind of feature engineering a quant take-home expects.