A Framework for How Data Informs Decisions
The Decision Intelligence / Data Science Integration Framework
As data storage and management becomes less expensive, many organizations are tasked with being “data-driven”. What does this mean in practice? For many, using data to inform important organizational decisions is an important goal.
This is the role of Decision Intelligence (DI): a practice that bridges decisions to data. However, it’s not always clear how data connects to decisions.
Decision Intelligence treats decision making as a thought and/or simulation process, which is meant to match as effectively as possible a chain of cause-and-effect links from actions to outcomes. For instance, the rationale (or Why) for you to spend time learning about decision intelligence (a Choice), may include a number of desired Outcomes, such as greater success for your organization in achieving its goals, or your desire to earn a higher salary.
Each such outcome is achieved through a chain of reasoning (How). For instance, working backwards from outcomes to choices, your greater income may derive from a raise, which is achieved in part by demonstrating innovative knowledge, which comes from giving a talk about decision intelligence within your organization, which is enabled by time spent learning about DI. Chains of events like this can be drawn using boxes called Choices, Intermediates, and Outcomes; along with arrows, called Dependencies. The result is a Causal Decision Diagram (CDD), as I describe in detail in my book, Link.
The picture below shows another—slightly more complex—example. Here, you’re making a choice as to whether to spend more money on coffee at the grocery store because it’s fair trade and bird-friendly. On the left-hand-side of the picture, you’re thinking about how this choice might impact the environment and workers on coffee plantations. On the right-hand side of the figure, your decision becomes an action you take (reaching for the more expensive coffee on the shelf), along with the knock-on effects—ultimately leading to outcomes — that result from that action (reduced inequality between workers and coffee companies, a reduced negative ecological impact). This is called a Causal Decision Diagram, or CDD.
Data can inform this decision-making process in many ways. The image below shows 15 roles that data plays in making better decisions: eight in advance of taking an action as you simulate and/or think about the decision, and seven places where you can use data as the decision plays out in reality. It shows two CDD templates, one for each of these time frames, for which I use the notation “T()” and “R()”, respectively. The goal of the “T” process is to capture enough of the elements of how a decision will impact reality such that the choices made lead to good actions that in turn lead to good outcomes.
“T” items in the DI/DS framework are as follows
T(i): A scalar Choice, which in reality (R) will be an Action, and which can be represented by data. For instance, the number of minutes you choose to spend reading this post can be represented by a number. Or a binary value (true/false) might represent whether you choose to read this post at all. Importantly, this is data over which the decision maker has Authority (in contrast to External data, described below). Note that a choice is one option with a broader decision Lever, which constitutes a list of mutually exclusive choices.
T(ii): A more complex choice, perhaps representing a plan. For instance, you might plan to spend 10 minutes studying decision intelligence today, 20 tomorrow, and a half hour on Wednesday. Some decisions are naturally represented in this way than as scalar choices, as in T(i) above.
To the extent that an External is uncertain, it is often called an Assumption.
External data may be more or less up-to-date. One important role for data science is that it provides real-time sensors and data management techniques that allow a decision maker to use the most up-to-date externals available.
T(iv): A more complex External. For example, this might be a prediction of market salaries for your job position over the next three years. This prediction might be created by an AI or ML model, a statistical model, or a human forecaster. More generally, there are additional technologies and disciplines which can inform predictive externals, such as meteorology to predict weather, computational fluid dynamics (CFD) to predict movement of Covid-19 particles, or economic forecasts to predict GDP.
The use of time-based predictions like this is where forecasting informs decision making. Note that the use of multiple time-based predictions, and their combination into a decision model, provides a different sort of “data from the future” that has largely heretofore been confounded with forecasting. A forecast of how actions within your sphere of authority impact outcomes for which you are responsible is different than—but as you can see here, benefits from—forecasts about factors over which you have no control.
T(v): A Dependency link that represents how one factor influences another. For instance, you may have a suspicion as to the degree to which your demonstrated expertise will improve your chances for promotion. So your demonstrated expertise level is on the left-hand-side of this link and your chance for promotion is on the right-hand side. The dependency link is simply a function that converts one to the other.
Dependency links are one of the most important locations for data to inform decisions, because data can be used to create statistical models, machine learning models, simulations, and more. It bears repeating that this is where AI best informs strategic decision making: a machine learning model built from historical data can be used within a novel decision environment, thus applying this important technology for the first time into a solution for “Black Swan” events (brand-new situations for which we don’t have any data) as a piecewise and reusable component.
Importantly, most decision models include a variety of different types of dependency links: some might be simple math equations (net revenue = gross revenue – cost), some might be best represented as logical inference (A and B imply C).
Note that one of the dependency links in this diagram loops back from an outcome to an intermediate. Most situations include these kind of feedback effects; their dynamics are strong drivers of how decisions lead to outcomes. This nonlinear effect can swamp the data, and is generally more important than highly precise data in many systems.
In many organizations, the data representing intermediates is captured as a Key Process Indicator (KPI), which is interpreted as a leading indicator or proxy (substitute) for the actual desired outcome.
T(vii): An Outcome: something that can be measured at the end of the dependency chain. Note that a Goal is not shown in this diagram: formally it is a condition on an outcome that produces a true/false result. For instance “My average salary 2022-2024” is an outcome, and “My average salary 2022-2024 > US$75000” is a goal.
T(viii): When a decision model is computerized, the computer can experiment with many different sets of choices to determine which one, in simulation, leads to the best outcomes.
T(vi): An Intermediate element is something that can be measured, such as in the above example the level of perception of your expertise within your organization. In many organizations, the data representing intermediates is captured as a Key Process Indicator (KPI), which is interpreted as a leading indicator or proxy (substitute) for the actual desired outcome.
T(ix): A computerized decision model creates a platform that allows for multiple Scenarios—vectors of environment (aka context) variables—to be considered. Together, numerical simulation of both multiple choices along with multiple scenarios of this sort provides enhanced “Data from the future”: now with this T(ix) scenario enumeration, allowing us to analyze multiple futures along with multiple choices.
“R” items in the image above represent how a decision impacts a chain of events in reality, as opposed to simulation or ideation. But, of course, the map is not the territory. The role of data shifts a bit here, as follows:
R(i): Corresponding to each T(i) Choice is an R(i) Action. For example, you are reading this post right now; that behavior is different than the idea of that action when you were considering doing so. An action, as it plays out in time, can be measured and captured as data.
R(ii): As with T(ii), you might make a choice to take a series of actions, which in reality can take place over a period of time. That plan can be represented as data as well.
R(iii): Represents an external measurement that the decision maker may or may not choose to measure after the decision is made. For example, after you’ve chosen to read this post, and after you’ve started reading, you might choose to set up a mechanism to measure average salaries for people in your position. You might further choose to track any discrepancy between assumptions about this external factor that you made during decision making and its actual value, especially if you think it’s changeable, uncertain, and/or particularly impactful on your decision outcome. You might discover, for example, that the job title to which you aspire does not typically come with a substantial raise, after all.
Also, as with T(iii), data science provides more effective and efficient mechanisms to keep data up to date, shortening the time frame by which it is possible to detect changing circumstances (as represented by Externals) that justify the need for a new decision.
R(iv): Represents a more complex External, such as a time-based prediction of the weather. Forecasts may or may not be correct, and so, again, it can sometimes be a good idea to track assumptions about the future made during decision making against how they play out in reality.
R(v): Represents a cause-and-effect dependency influence as it plays out in reality. Usually these are monitored through measuring their impacts, as in R(vi), below.
R(vi): Representing an Intermediate, which as above are also understood as KPIs. In contrast to T(vi), here we’re not planning the measurement of KPIs but actually doing that measurement as our actions play out. This can provide an early warning system to a decision going adrift from its intended outcome(s).
Systematic measurement of intermediates and outcomes for decisions that are made multiple times can produce data that can be used to improve data science models that populate dependency links.
R(vii): Measuring Outcomes of actions taken complements measurement of leading indicators. To the extent that a decision is repeated, these outcomes can form training data for machine learning models.
This Data Science / Decision Intelligence integration framework can be used to plan and architect how data fits into decision making. The CDD on which it is based reflects the “natural” mental model for human decision makers, which has been well-studied for over a century. For this reason, CDDs uniquely offload human cognitive processing, freeing us up to think more carefully about complex decisions. This framework fills an important gap that has existed until now, between the most widespread human mental models and data. From this point of view, it makes a strong claim vis a vis the mechanism for data/human collaboration, aka Intelligence Amplification, and I believe has the potential to vastly improve our ability to work hand-in-hand with data to solve problems in a complex, multi-link, and rapidly-changing world.