Mark Zangari speaks on Agency and Machine Learning: from data to actions
Once upon a time there were programmers, but not software engineers. As businesses and other organizations learned the value of this new technology, software engineering emerged as a discipline to derive maximum business value from it.
There is a similar need emerging in data science today. This means that machine learning is underutilized compared to its potential in solving business problems. So the question is, how to bridge from the business to data and machine learning to drive maximum value?
Quantellia co-founder Mark Zangari presented a talk on this topic, called “Agency”, in Seattle on May 1 at MLConf 2015.
You can watch the video below, and here are the slides. (Note that the video freezes a few minutes before the end, so to understand what Mark’s talking about, you’ll want to look at the slide deck.) I’ve included some highlights below.
- At Quantellia we’ve been building decision support systems with machine learning and other techniques in a variety of industries, and working to give our business users the best value we can.
- Our focus: given that organizations have made this investment in data, how can we help them to get the most value out of it?
- Without this approach, we’re seeing a lot of activities around data that don’t drive value, as well as opportunities to use the data to drive business value that aren’t being met.
- When we look at data through the lens of agency, we realize that data alone doesn’t really do anything. If I know the number of days of sunshine in a year, how can that help me? This information is valuable if there are levers that I can use: actions that I can take that, when combined with this information, leads to something useful.
- Instead of “given input data A, what output data B does my model predict”, Agency answers the question “If I do A to complex dynamical system A, with the intention of achieving objective B, what is the probability that B’s outputs are closer to it?”
- Traditional machine learning rests on four pillars:
- Enough training data must be available
- The system must be stable over the time frame of the data gathering and the prediction
- Input must be available at the time that you need it
- There must be a sufficient signal in the data
- But what happens if these conditions don’t hold? Do we pack up and go home? No.
- Let’s illustrate this using a specific case. It’s the difference between commercial and consumer lending in most big financial institutions.
- Consumer credit is a classic “four pillars of machine learning” situation: it meets all of the above criteria. But even it wasn’t obvious at first. Fair and Isaac spent months in 1958, proposing that data would be useful for credit rating: this was a radical idea back then, and only one bank took an interest.
- In contrast, commercial credit is a whole different kettle of fish. It is particularly so for small and medium businesses. It’s very different for finance companies to determine the credit worthiness of these organizations. To address this using agency, we agree to a business objective, then decompose the problem into smaller causal links, so that machine learning can solve each one where possible. An example for the commercial credit rating situation is shown below. At Quantellia, we are formalizing systems decomposition as a standard approach for problems that are too complex for machine learning alone.
- This allows us to handle situations where we have sparse data, complex dynamical systems with feedback loops and nonlinear effects, intangibles, and relationships that aren’t easily extracted from training data. Then you can build a model and augment the training data.
- Formally, the agency of a lever is analogous to the information in a message in information theory. There are a lot of things that people spend a lot of time on that don’t actually affect the objectives. Understanding how an action that you can take is coupled to the objectives that it achieves is very important.
The bottom line: Agency provides a formal mechanism to focus activities so that, given the data available, you can achieve the maximum impact on business objectives.