Big Decisions, Small Data: Eight Key Principles
Conquering the most important problems faced by complex organizations often requires great models, not “big data”: in many situations, better decisions can be made with imperfect or incomplete data.
As storage and processing costs plummet, high-bandwidth networks become cheaper, and data analysis methods become mainstream, organizations have mined a rich store of information—“Big data”— primarily about the behavior of their customers and details of operations. However, outside of these arenas, big data often looks like an answer looking for a question.
Paradoxically, big data is most effective for the smallest decisions that an organization makes, but that it makes very often, and must therefore be able to be made at almost zero cost. An example: “what discount plan should I offer this customer?” or “what other product should I recommend to this customer who’s just purchased this book?”
By contrast, big decisions have different characteristics that are often not as well-supported by big data. Here, agility and human insight are often more important than automation. Smart organizations know that smaller, well-crafted data sets, along with powerful models, are the basis for the best decision outcomes.
Avoiding the dangers of big data over-analysis
Organizations must recognize the difference between situations that require massive amounts of highly detailed information (requiring time-consuming migration and cleansing efforts), and those situations where agility is at a premium, and where imperfect or incomplete data sets are a better choice. Without such a distinction, we can fall prey to “gratuitous data cleansing”: an expensive, risky, and time-consuming exercise.
Omitted from the brochure of “big data” management products is a dirty secret: the enormous cost—often based on highly manual effort— that gathering, cleansing, and unifying data requires as a precondition before the benefits of big data can be realized.
“The decision is only as good as the data that supports it” is an often-misunderstood claim. For example, a pollster can reliably predict an election result based on interviews with a tiny percentage of the electorate. And gathering data on eye color is unlikely to improve the accuracy of a credit score. There is a law of diminishing returns on the quantity of information, and not every piece of information is critical to every decision. Without a good model, we can succumb to a costly fixation on data: focusing on excessive or unimportant information at the cost of agile and effective decision making. The good news: understanding how to make great decisions with small data requires just a few simple principles, as I’ll explain below.
How did we get here?
To understand the road forward beyond big data, it helps to understand its history. The cost per megabyte of storage dropped from close to $200 to about 122 megabytes for a penny—that’s over 2.4 million percent—from about 1984 to 2010. Since it’s so cheap to store and transmit, governments, the internet, and even machines, are generating huge data sets. The Boeing 787, for instance, generates half a terabyte of data per flight, and the US Federal Reserve alone produces 73,000 economic statistics. This explosion of information has, in turn, driven new analytics technology to make sense of it.
The business value of big data is like the song of the mythical sirens, whose sweet voices lured sailors to their doom. It has created a focus on the parts of a business in which data is plentiful. However, the location of easy data availability in an organization does not correspond to the most important business problems. This is why strategic decision makers will follow a pattern where they “pore over the data, then set it aside to argue”, instead of reaching a decision by following a more structured or systematic process. On the flip side, it is often the case that expensive data management projects produce little business value.
Small data principles
The answer, simply put, is that smart organizations have a secret: They know how to use small data to drive their most important decisions. As a result, they produce great outcomes quickly and cheaply. Here’s how.
- Recognize the difference between operational and analytical data, and treat them differently. For our purposes, with operational data, every byte matters, but analytical data can be imperfect or incomplete. Examples: the size of a bolt on an airplane, or the costs on your phone bill: operational. The likelihood of civil war in a country given its poverty level: that’s analytical, which requires a different—and often much less expensive and time-consuming— approach to data management than if every byte was analyzed.
- Understand that great decisions can be made from uncertain data. If you’re sure that your customers will prefer your product as long as your price is at least $5 less than your average competitor, then you don’t need to know if your competitor charges $6 more or $600 more: huge uncertainty in the data that leads to a nonetheless highly confident decision. Many situations are like this.
- Understand the sensitivity of your decision to key assumptions. Often, decisions are made based on a number of “key assumptions”: those data elements that both have uncertainty as well as to which the decision is highly sensitive. If you’re launching a new product into Europe, and your competitor’s launch there would substantially change your prospects, and you’re not sure if they will: that’s a key assumption. Focus all of your data gathering and cleansing effort on these key assumptions—often a “small” set of data compared to big datasets— and deemphasize the rest.
- Don’t overlook the value of human expertise. Somehow, big data hype has led us to ignore our most important asset: between the ears. We swing between two extremes: projects either ignore this most important asset or rely exclusively upon it: ignoring the data altogether. But a structured decision modeling approach can produce the best of both worlds, where human expertise substitutes for data when it is missing, and vice versa.
- Promote your modelers. A good modeler is an employee who understands your business, understands your data (along with its limitations), and has a good head for seeing the pattern in the noise. They’re worth their weight in gold, and can help you to overcome the shortage of data scientists. Probably the world’s first “celebrity modeler”, for instance, Nate Silver uses small data sets, not big ones, to make his predictions.
- Use visual analysis. Many modelers come from a quantitative background, and so are most comfortable with math and spreadsheets. Yet non-modelers often have to work very hard to understand them. Encourage a cross-disciplinary team to use create visual models to make the invisible, visible, and shared in an aligned, collaborative way.
- Understand systems. Core to the modeler’s toolkit is an understanding of systems dynamics: feedback loops, winner-take-all dynamics and the like. The distinction between your organization being in a downward spiral versus enjoying the “invisible engine” of a positive network effect, matters much more than specific data about your situation.
- Be agile. Near-term data is often the most accurate, and least costly to obtain. When long-term information is suspect, or in short supply, you can substitute for it by moving from “Titanic” to “white water rapids” mode: steer your business frequently as new information becomes available.
Underneath the big data hype is an important assumption: that the future is like the past. Modelers know different: they understand the limitations of historical data, the mistakes we make when we don’t understand those limitations, and how to assemble historical data together with human expertise to build models to drive business value in a rapidly changing future.
By building an accurate model of a decision, and “baking in” expert knowledge, we can build a framework that can guide us towards the data that has the highest value for decision making, and avoid wasting time and effort on data gathering, storage, migration, and cleansing projects that don’t matter. By doing this, we can transcend the often high cost of “big data” and move towards an organization-wide discipline of better decision making .
This post is an encore presentation of an article that previously appeared in the World Modeler Blog.