I just saw the trailer for a new movie, and I loved that it featured the LASSO very prominently. I thought, this is perfect for statisticians, so of course I made GIF from it.
The LASSO (short for least absolute shrinkage and selection operator) is a method developed by Stanford’s Robert Tibshirani in 1996. With the same excitement as in the GIF above, the LASSO has taken statistics and machine learning by storm. It’s machine learning’s golden child, especially now that there are so many high-dimensional datasets (as in, lots of covariates), which some people call “big data”.
The LASSO helps select the most important variables for a model (very useful if you have thousands of variables), and it also helps penalize for overfitting. Cool! But how do you do inference with a model after you’ve removed a set of variables? Is data-splitting enough?
My fellow PhD students, Kevin Lin and Sangwon Hyun, have worked with Ryan Tibshirani (Rob’s son) on how to do post-selection inference. Look out for Kevin Lin’s story on this topic in the October issue of Significance magazine.