Many questions asked in market research often show a high degree of correlation (intertwining) with each other. Classical Ordinary Least Squares (OLS) methods lose their statistical power in the face of this multicollinearity and suffer from variance inflation (VIF).
The LASSO algorithm adds an L1 penalty term to the model, mathematically forcing the coefficients of survey items with low predictive power and redundant items to exactly zero (\(\beta = 0\)). Thus, what remains are the purest, independent, and strongest "Driver" variables that explain consumer behavior.
- When we filter out the repetitive ones from the 60 different satisfaction questions we asked in the survey, what are the sole remaining "Purchase Triggers"?
- What is the most narrowed-down, yet highly predictive (parsimonious) variable set we need to focus on to gain market share?
- R&D and Marketing Focus: Eliminates meaningless variables that appear "as if they are important" due to statistical noise. Allows you to focus your investment budget on the "rare and real" processes that mathematically change consumer behavior.