Product Management with a Data Science Mindset.
Sprint 4— Monitoring
I have been asked about monitoring a lot lately. I am going to share my thoughts about monitoring from a product managers perspective, i.e. the desired functionality. The implementation is left up to the Dev/MLOps teams, who are better skilled in that area. This is part of a series which starts here.
So you are going to add the first model to your product. And everyone….and I mean everyone, is telling you you need to monitor the performance. Oh, and it needs dashboards …and alarms…plus email alerts. What now?
Approach Monitoring like you approach every other feature, understand the problem you wish to solve.
I personally think of data products from a systems perspective. I want monitoring to tell me some key facts. I want to know if a product is generating the outputs within spec, according to the performance metrics. I also want to understand if its unstable or giving indication of declining health. I think of monitoring as the tool to allow me to peer inside the system at both a micro and macro level.
Where did this point of view come from? Well, when I was designing a model-based solution and I had to track down an issue with a model’s outputs, I would have to assess, analyze, validate, and dissect everything that I thought would reveal why it was not working as intended. As a data scientist working in an interactive environment, I had the advantage of being able to simultaneously look at the code, the input (both data and parameters), and its intermediate outputs. In much the same way as a developer dissects their code tracking down bugs during development, I could assess the data and pipeline to try to understand why the results did not meet my expectations.
That scenario was during design and you, as the PM, need to keep track of what’s happening in production. That situation is a little different.
A model will degrade once it is put into production. Why? When a model is trained on a data set, it captures patterns of that data set within the limitations of its learning algorithm. The snapshot of data that it trained on is never identical to the continuous flow of data in production. Also, aspects of the problem may change over time due new outside factors and even the presence of the model itself. The data scientists (and PM) cannot dissect the model in production the way they can in their IDE.
The objective of monitoring is to try to capture as much information about the data product pipeline as is reasonable to determine its present health and to give information on root cause when there is an issue. Part of what you are used to doing as a product manager is assessing risks and benefits. The design of the monitoring infrastructure is no different. The reason why monitoring the ML component(s), as opposed to the entire product pipeline, receives so much attention is that is is perceived to have the the highest risk. And while it may, models do not act in isolation.
Bear with me while I try to illustrate the ideas using three simple block diagrams of the same pipeline. In each diagram, the far left is the data input and the far right is the product. Each diagram shows the information gathering objectives differently.
Consider the top panel of figure 2. Without even knowing what is happening inside of the pipeline, I am certain that I want to monitor the properties of the data inputs and the KPIs of the outputs. Think about it. How will I know if the quality of the input data has degraded in terms of fill-rate, cleanliness, quantity, etc. unless I gather data on the data? And if it is contractually supplied, are we getting what we paid for? Next consider the output. While it is easy to imagine how to assess the shape, size and weight of a fabricated object, like a trophy, what makes the product in question good? How to measure it? For example, if the product is a suggestion engine (movie, purchase, etc.) then is the indicator of success a high compliance rate? Something else? Just like the production of an object has tolerances, there are tolerances on the output.
Now consider the middle panel in Figure 2. That single deliverable from production needs to be traceable to the components that created it. Consider that the product provides an prediction of something and then the KPIs indicate that it is highly likely to be wrong. (I say “likely” because the truth may not be certain, yet.) In order to investigate, we will want to know properties of the components that created it. For example, for a given output(s) I may want to track the the job number, the model version, model inputs, inference, and any conditional post-processing.
Finally, the bottom panel has emphasized each separate process in the pipeline. I will want o know how each process is working over time (or by volume). It is easy to understand the task of monitoring the performance of models by comparing the inference to ground truth. This is good, and not that hard if the ground truth is available. Sometimes its is not but indirect indicators are useful. But, I will argue that more information that just comparison to ground truth is required , and often that information can be pretty lightweight. For example, if a process uses a model inference or an imputed value (when the model can’t predict) then you probably want to know the rate of imputation and the change in the rate of imputation over time. I have never spec’d an algorithm, model, trait, score, etc. without considering how I will measure its performance.
This toy (or trophy) example just scratches the surface of the data will you want to gather. Working with the data science team will help you nail down everything you need to store. And, sadly, so will experience. We all miss something.
Now you are likely wondering, is this just an exercise in data collection? Well, no. All of this product pipeline information needs to be accessible to the people and code that needs to use it. And, it needs to be easy to extract insights from. Depending on the stakeholders, that may also mean dashboards or reports. Your PMs and Data Scientists will definitely want it in a data lake or better yet in a relational database. And what about alerts? Once you understand the data and its relationship to adverse outcomes, then you can finally set thresholds on the performance indicators.
The reality is, no amount of fancy dashboards or emails will help you if you don’t have the data you need for those tools to be useful. I suggest you prioritize who needs to get access to the performance data and how they will access it. You know, create some use cases, identify the features and prioritize. You are a product manager after all.