Sprint 3 — Is Data Science an Oxymoron?
This post is the third in a series about being an ML PM, that started here.
When I first learned about science, way back in elementary school, it was presented as a series of facts. As part of the learning process, students would perform some experiments to gather observations that would allow us to further understand the facts.
The scientific concepts of theory and hypothesis were not introduced until much much later. (Sadly, I think, too late for many people to understand that the scientific body of knowledge changes, leading to a distrust of scientists.) Those of us who went on to have more advanced math, learned the statistical concept of hypothesis testing and the mathematical methods that go with that concept.
But this article is about data science and data scientists, not necessarily scientific education. Data scientist is a term created to identify people with a particular skill set. My interpretation of that title is someone the abilities related to using data to create useful output through statistics and model development. The term emerged in the 2000s and there are lots of interesting articles that can provide you with its history. (Here are some examples: modern history of data science and evolution of data science)
I have been involved in many hiring interviews for data scientists. The requirements listed in the job postings vary. Some hiring managers ask for everything (mathematician + software engineer + data engineer +++) or are vague and open for interpretations. Also, to my knowledge, there is no standard definition of this professional title. For these reasons, a data scientist job posting can attract a variety of individuals. Applicants range from those traditionally considered analysts, to those with general college or university credentials plus a couple on-line courses, to specialized software developers, to those with PhDs/post docs. The data science team is often comprised of a group of people with similar titles but vastly different experience levels. They may not all be scientists, at least not the way I view scientific methods.
Why am I babbling on about this? In my experience, based on the data scientist’s experience and knowledge AND the processes applied within the data science team, one cannot always be assured that scientific principals have been applied to the solution presented to the PM. To make matters worse, product teams often ask for a solution for a problem with the assumption that one can be found from the data available. Product forgets that this is research.
So when you are provided with the data science work product, carefully consider what you asked for and how the solution was arrived at. Because in the end, as PM, you are responsible for the feature that you incorporate into the product even if it was designed by someone else.
Where does this leave you when working with the DS team? Here are some thoughts that I want to share with a PM new to working with a DS team:
- All models are wrong, some are useful. (Attributed to George Box, British statistician.) Models fit to the data they are trained on, constrained by the learning algorithm and tuning parameters that guide the learning. Simply put, if you provide a learning algorithm with what it needs to run to completion, a model with be generated. That does not mean it will be useful. So ask questions and gain an understanding of the assumptions, where the solution fails, and how to monitor for degradation.
- We often formulate our opinion and then look for data to support it. A good data science team should have rigorous validation and testing methods to support their assertion that the solution is what you need. Understand the assessment that the solution went through and ask yourself if this will support your use cases. Everyone should be aware that time pressure, and faith in iterative upgrades, can lead to the implementation of a bad solution. Question everything before it goes into the product. (An interesting side read on how we cling to our opinions: Facts don’t change our minds)
- There are ways to measure bias. If you are in the business of putting models in products, you should already be aware of the problems of bias. Racist and sexist models have been in the headlines and they have real consequences. Make sure the DS team has tested for bias and that you have a plan to continue testing.
Is data science an oxymoron? You have a say in the answer. Chose wisely.