A Product Manager’s Guide to Bias in AI

Tyna Hope
7 min readApr 12, 2023

--

Bias is everywhere. Despite my efforts, I acknowledge that my ability to appreciate bias is limited due to my personal POV. This opinion post is intended to help product managers understand bias better so they can create better products.

As an ML PM, you’d better understand something about bias in AI. Why? Because when you integrate any model into your product, bias comes with it. I recently attended an ML summit and listened to a speaker list all the usage of models in their industry. If I am being honest, what I heard was all of the opportunities for unchecked AI bias.

What is bias? There are many definitions, I really like the one by vocabulary.com: “Use the noun bias to mean a preference for one thing over another, especially an unfair one.”

Photo by Markus Spiske on Unsplash

When we speak of AI bias, we are typically referring to a model favoring a particular outcome in a way that disadvantages a population. To be more specific, model bias can be viewed as a systematic error in the outputs. The term bias also refers to how we can end up with that AI output bias such as the limitations of the learning algorithms, the input data, input parameters and so forth. One example is sampling bias, this creates samples with over or under-represented attributes and therefore do not represent the population. Or there is measurement bias, where something isn’t even measured or is, but with errors. I will distinguish between the various types of bias by using descriptors such as AI or sample.

As a middle aged white woman, the most obvious way that I observe AI bias in my personal life is in the social feed that I consume. I see a ton of ads for anti-wrinkle creams and weight loss, and nearly nothing else. I am not interested in either of those topics. But, that social-media feed bias is one that I see from a point of privilege. If that is the worst way that AI bias impacts my life, then I am truly one of the lucky ones.

The reality is that models are incorporated into every aspect of our lives. Apply for anything such as a loan, insurance, job, etc. and there are models that assess you and their risk if they say yes to you. Buy anything online and you are shown items that you are likely to buy, sometimes regardless of what you are actually looking for. These recommendations are based on “your segment” or past activities of users within your browser.

It doesn’t stop at your more obvious purchases. See your health care provider and they will have been provided educational material for treatments based on models. The transportation systems you use, have been designed based on models to determine the expected travel patterns. And Crime forecasting. (I’ll admit that I didn’t even know that was a thing until recently.) Most, if not all, organizations that impact you, use modelled outputs. If you are a product manager tasked with incorporating a modelled output, then your product forms part of this list.

Is this dystopia? Is this helpful and efficient? It all depends on your POV. Its pretty obvious that our world is flooded with data. Its gathered with every purchase we make, webpage we interact with, and machine we use. Data gathering sensors and interfaces permeate our environment. As consumers of goods and services, we have come to expect fast and (reasonably) accurate responses to our demands. The only way to give personalized responses to billions of people’s request for goods and services is through machine assisted data analysis and, in many cases, models that lump you and me into “look alike” populations. And there we are, are yet another source of AI bias.

If there is one thing that I would like you to take away from this post it is this:

The model’s learning method will identify the training sample’s patterns within its limitations and will generate a model with biases. With awareness and appropriate usage, we can reduce the impact of them.

Photo by Sean Benesh on Unsplash

Here are some sources of bias:

Sample data are an integral part of training, validating, and generating performance metrics.

  • Data come from systems with inherent bias. Data are targeted for collection by someone who has determined what information is worth gathering given its cost to do so and its perceived value after collection. Regulation attempts to protect privacy and so in some cases aggregated values (such as averages) may be used, which results in more bias. If you have taken an science in your life (which I am sure that you have) consider this the same as an uncontrolled experiment.
  • Data are never clean. That is to say that the desired information may be missing, contain errors, and then must be imputed to be useful. Imputation is yet another model with bias.
  • Bias can be due to un-gathered data. Additionally consider where a relevant attribute or vital contextual information not collected at all. There are many examples within the book Invisible Women.
  • Selected training data will be biased. Training data are selected by people with a certain life experience, and POV on the world. Sometimes training samples are selected without considering what attributes that could be used in pattern recognition, when they are not information bearing when put into production. The infamous example is the wolves versus dogs example where the snowy background was the key attribute. Remember that the model codifies its discovered patterns in the training data, even noise and irrelevant ones.

Moving on from data to another source of bias…

Design in ML is a bit of an oxymoron. Compared to engineering practices around, say designing structures or electrical systems, the data science world is significantly less encumbered by rules.

  • The designer selects the learning algorithm. Learning algorithms have inherent limitations, and learning parameters are often manually selected. Sometimes the learning algorithm is selected based on limitations of the data, rather than improving the data itself which may not be practical. Or they may be based on the designers preferences.
  • There is a lack of design standards. The design of ML models occurs within a profession that does not have legally or ethically enforced design and validation standards. To my knowledge, there are some best practices but they are not applied consistently.
  • The requirement to generalize. A guiding principle for an data scientist is to create models that do not overfit to data but rather generalize so that they apply to more situations. This, by definition means that bias is built in because we are removing the sensitivity of the model to small variations. Do we know if those variations are insignificant? (See one of the many articles on bias and variance trade-off.)

and finally, where we should catch and expose bias…

Performance Evaluation should occur during design, validation, testing and during post production monitoring.

  • Gather data on the model. The metrics you collect should reveal how well the model is doing. Remember that a measurement tells you something about one dimension of performance, just like one attribute tells you something about your population of interest. If the incorrect metric is chosen for your use case, you will have a false sense of how well the model is performing.
  • Metrics are boxed. Metrics are calculated using samples. The samples used may be a certain number of modelled outputs or all outputs during a specific time period. And maybe not all outputs will form part of the performance metrics. Regardless of what sample is used when calculating the metrics, all previous comment about data samples apply. Testing the model on samples that do not represent the population when the model is live, means that you may be using outputs that are garbage.

Appropriate usage means that the model is used to answer the questions it was intended for with full transparency on the assumptions and biases.

  • All models make assumptions. It may be that the input data are multi-variate normal or that any third -party reference data are kept up to date. Be aware of the model’s assumptions and share them with the users.
  • What are we generalizing to? Limit usage the same population as the training population. Implement monitoring and limit when the output can be served to the end user when appropriate.

Based on this list you might think that I am anti-AI. Not at all. Models are incredibly useful. They give us insight into mountains of data that humans are unable to parse through effectively. Simply, they are valuable tools that are not going away anytime soon. Models are like any other technology that has been used to improve industrial throughput and better focus effort. It can act as a lever to provide an advantage.

However, if you are working as an ML PM, you should be well aware of all of the ways that AI has gone wrong. From racist banking and justice models, to sexist HR screening models. There are no shortage of examples.

When you integrate models and algorithms into your product, consider their limitations and ways to address any quality issues. Product managers are masters of prioritizing the minimum value to market and iterative improvements. I urge you to carefully consider what it means to create an MVP model that works “well enough”.

--

--

Tyna Hope

Electrical Engineer who worked as a data scientist then as a product manager, on LinkedIn. Opinions expressed are my own. See Defy Magazine for more: defymag.ca