Predictive analytics is not enough

The idea of predictive analytics can seem like magic: how, really, can a computer predict the future? Yet we’ve seen a lot of success based on this advanced technology in recent years, from Netflix to Amazon, Google, and more. These companies mine a massive amount of data every day for patterns, and it drives massive revenues.
However, for a widespread class of situations, predictive analytics alone aren’t enough. Consider the decision model below, which I introduced in my last post. The blue graphs on the right-hand side are based on predictive analytics, but they are only building blocks in the full model. They are not enough on their own.
As you can see, the blue graphs are predictions about how one thing causes another. But they’re only useful when we combine them with levers, and then link them together with outcomes. The prediction is only part of the puzzle.
To understand this, let’s start with what’s going on when computers predict the future. At first blush, the idea that a computer can do this can sound fishy. Yet computers are indeed good at finding patterns, especially if there’s a massive amount of data, and if a pattern holds into the future. Under these conditions, then, in a sense, this pattern recognition can be thought of as “future prediction”. This is the assumption underlying production predictive analytics systems today, whether they’re used for determining a pattern of credit card usage that indicates fraud, the pattern of Amazon browsing and buying behavior that predicts whether you’re going to buy a particular new book, or the pattern of search terms you type into google that predict your likelihood to click on an ad and buy a product.
The problem with these kinds of predictions in isolation occurs when the situation changes, and the old patterns don’t hold. Notoriously difficult, for example, is stock market prediction in a fickle world with macro effects that aren’t captured in the data. Nassim Taleb understands this situation best: as a derivatives trader and financial executive in the years leading up to the 2008 financial crisis, he was a front-row witness to the limitations of these approaches, and built a successful investment strategy based on the recognition that the most important events in markets are those for which patterns in historical data cease to hold.
Black Swan theory, as formulated by Taleb, captures the idea that a turkey, shortly before Thanksgiving, is exposed to a considerable amount of data indicating that his world is safe, and that he’ll be well-fed and housed. But the turkey lacks a knowledge of the surrounding system. This creates problems.
Before Thanksgiving, the turkey is lacking a systems model. Click To TweetThe problem: because he only relies on historical data, absent a cause-and-effect structure of the world at large, the turkey is in for quite a shock at the end of the month.
The benefit of a decision model: context that transcends turkey thinking. Click To TweetAnother limitation of predictive analytics in isolation: they make a simplifying assumption about the decisions surrounding them. Use cases in which predictive analytics alone are successful are also those in which it is feasible to avoid complex modeling of the impact of different decisions based on the analytic result, or where the decision is so simple, that such a model isn’t required. These are the situations where there is no Thanksgiving (are there Turkeys in Australia, or just black swans?) and historical data is good enough at predicting the future. The “black swan” mistake to assume this is true more often than it really is.
Take fraud, for example. If we detect a pattern of usage that might indicate criminal activity—whether using a credit card, telephone system, or on the internet—then what’s next? Fraud experts tell me that, in general, there are many choices, such as to hand off to a fraud team for further investigation, to automatically freeze the card, and more. Each of these choices has different costs and potential benefits. Yet many systems today we ignore these distinctions, apply a one-size-fits-all approach, and so our ability to detect fraud is falling behind.
We must be as sophisticated in our decision models as we are with our predictions. Click To TweetCustomer experience management (CEM) is another example. Many companies use “churn analytics” or “net promoter score” analytics to predict the likelihood that a customer will leave for a competitor. Yet what then? I’m sure that a crisp $1000 bill to each potential churner would reduce their likelihood to leave, but of course that’s not cost-effective.
Cost and benefit modeling is essential to predictive analytics, yet is often overlooked. Click To TweetSo this is an exciting time. We’ve become awesome at constructing the machine learning and statistical building blocks of a decision model. These are like great chocolate, and now it’s time to bake the cake: we must integrate predictive analytics within a framework of complex systems modeling to build decision intelligence support systems to obtain the full benefit of this groundbreaking technology.