Machine Learning Software Engineering: Top Five Best Practices

Back in the day when coding was new, as soon as software team was hired for a project, management would command: “code”. No design, no planning, no requirements, no oversight, nor QA.
After too many failed projects to count, software learned a hard lesson from other engineering disciplines, and software engineering was born.
Not so (yet) in standard machine learning (ML) practice. By and large, the management team says “do AI”, and leaves the technical team on their own. Because PhDs! Because complexity! Because management believes, incorrectly, that they can’t drive the car without understanding the engine. They fall for the Machine Learning Academic Fallacy. The Inside of the Box Illusion. However, you put it, the good news is that machine learning projects can and must be managed. And that to do so is a discipline in itself, no less important than software engineering compared to coding. But it doesn’t require a PhD to manage machine learning experts. Don’t be bedazzled.
Machine learning project success depends on ML-specific software engineering best practices
So here are the top-five machine learning software engineering mistakes and their corresponding best practices we’ve observed in our decades in ML deployments:
- Checking out. The ML team talks sophisticated math and a jargon-filled language that might as well be Martian, and you just leave this complicated project in their hands without holding them accountable. The best practice here is to focus on the outside of the box: just as you can drive a car without understanding carburetor chemistry, you can manage a machine learning project without following the math of gradient descent. But you need to know to ask the right “outside of the box” questions, which I cover below.
- Neglecting to ask for the objective function that is being measured at project start. How do you know if your project is a success? Are you maximizing true positives and willing to take some false positives along the way (this would be good for a security or a medical application, where missing a warning signal can be hugely costly)? Are you looking to minimize the error between stock price prediction and success? Will you be a success if you identify the top-10% devices likely to fail, even if they are ordered incorrectly within that top 10%?
- Ignoring the business decision context of the machine learning project. A prediction that doesn’t lead to an action is, essentially, useless. And most actions are based on a rationale: the decision. Understanding this larger context in a structured way is the entire field of Decision Intelligence (DI), but you can get a long way without any formality: just ensure you understand how your project fits into larger business goals. And re-align that understanding periodically.
- Over-engineering: Chances are, your team will have been trained in a machine learning “academic mindset”, where the criterion for success are originality of algorithm. Or the “Kaggle mindset”, where success means beating the highest-accuracy systems in production, perhaps in a Kaggle competition. Both of these postures can be dangerous in an applied environment, because they can lead to using untested algorithms, over-cleansing data, using more rows and/or columns in the data set than are necessary to drive benefit (and taking valuable months or years to do so). The best practice is to enforce the simplest baseline possible at project start (perhaps a linear model based on existing dirty data). More often than not (and surprisingly to the academic or Kaggle-minded expert) such a baseline can drive immediate business value. Remember: the bar for a successful project is much lower here.
- Data over-cleansing: Often a dirty dataset will produce a machine learning model with tremendous business value. Why: we are looking for patterns, so the signal needs to be bigger than the noise: a much lower bar than, say, for a banking system that must track transactions accuracy. See this post for how to overcome this issue.