Bias

Principle

Definition

The socio-cultural interpretation of an AI’s behavior over time. Fundamentally intrinsic to decision-making systems, bias can never be entirely removed, only constrained. To do so, the system must allow for external normative evaluation.

Overview

There is no such thing as an un-biased AI, in the same way that there is no such thing as an unbiased person^[1]. However, mitigating bias should be a core consideration of designing AI (and not for the sake of avoiding lawsuits). Bias is present in both the mundane and extreme cases, and certain biases may actually improve the user experience of your AI if designed well. Other biases could be extremely toxic, perpetuating harmful stereotypes and prejudices. Designers must play a part in unpacking the biases present in AI, and taking responsibility for the greater social repercussions of their products.

A bias is a tendency of a system to display an affinity towards certain kinds of decisions. Let’s take, as a non-AI example, a restaurant review catalogue. A restaurant review catalogue may end up seeming biased towards expensive restaurants, even if their reviews were only concerned with the uniqueness of entrées. This is because restaurants serving low-cost ‘standard fare’ would receive worse reviews, despite having potentially satisfied customers.

The above example suggests a few crucial things about bias—first, that a ‘bias’ does not always signify ‘wrong’ (just because good reviews are correlated with expensive restaurants does not make them necessarily unethical). In addition, it suggests that biases can emerge from seemingly unrelated design decisions. While certain biases may not hurt, or may even serve to benefit your users, other kinds of biases do relate to systemic harms and discrimination. Responsible, human-centered AI designers are aware of highly systemic biases in both a given product as well as society at large, as they make critical design decisions. Within AI, biases of all types can emerge in either data collection, in user experience design, or in the interaction between the system and its environment over time.

Bias in Data Collection

Biases observed in an AI are usually a result of the data the AI was trained with. Data by definition mirrors the world around us, including the social and cultural realities we live in. If your data is collected from a highly wealthy community, that data will mirror the inequalities of that wealth—the lifestyles, preferences, opinions, and habits that come with it. Most data collection strategies are unfortunately highly skewed towards certain population segments. To take a seemingly innocuous example, consider an online survey that you may want to send to a group of users. Responses to that survey are likely to reflect certain biases—more digitally proficient users, users with more time, users with stronger positive or negative opinions, users with better language skills, etc. Collecting data is tricky, as the phenomenon you are interested in may not even be easily measurable in the first place, requiring various compromises or inaccurate proxy metrics. Therefore, you must treat your data collection strategy as a design decision that filters down to the end user experience (see Problem Selection & Definition).

Even with a highly equitable and mindful data collection strategy, your data may still lead to completely unacceptable outcomes. This is because data-driven AI systems implicitly contain an assumption—that future decisions (predictions) should closely reflect past decisions (data). In systems of law, there are theories around ‘precedent’ and norms overturning past precents. However, AI systems largely lack this nuance, instead uniformly applying their data-driven precedents with mechanistic consistency. Always consider the fairness of past decisions as you apply data to make future ones.

Replaying Data Collection

Often, a team will bring in an existing dataset, but fail to clearly understand the means by which that data was acquired. This can lead to inexplicable outcomes down the road. One way around this is to replay the data-collection process, either by sitting down and completing the task on your own, or by recording the data labelers as they annotate the data themselves.

Bias as User Experience

In certain ways, an AI’s biases are synonymous with that AI’s user experience. For example, a broadly multi-ethnic user-base may prefer a restaurant recommender that has the tendency to suggest various ethnic cuisines. In designing an AI, it often helps to list a set of positive biases that may be preferred by your users, in addition to a list of negative biases that may create harm over time. Many biases are extremely nuanced, and users may not be able to even describe them. But across large user-bases, these aggregate preferences will certainly affect your product. Do not relegate bias to an HR workshop—make it a core topic of conversation in your teamwork. The most successful products in the world have biases, sometimes extreme ones. However, a user experience may also exacerbate harmful biases through user experience, e.g. by disguising decisions to users as ‘data-driven’, or by only showing individual users a few results to prevent anyone from raising suspicion.

Complex Feedback Cycles

Unfortunately for designers, many biases may emerge from the relationship between a product and the social forces that surround it. Taking the prior example of a restaurant review catalogue, perhap good reviews from it lead restaurants to increase their prices. Therefore, even if reviewers were to produce their catalogue in a highly ‘bias-aware’ way with equal concern given to low- and high-cost options, users could still end up finding all of the highly rated restaurants are expensive! Recognize that not all biases are part of your individual design, but that bias is an ongoing question, since all products have an effect on the world around them.

Automated Bias Correction

While some researchers have touted the capabilities of algorithms that automatically remove bias, these algorithms cannot address the core cause of discrimination in AI. Many biases are impossible to remove without first observing the behavior of the AI in the real world, especially when the bias cannot be detected in the dataset alone.

Bias as Social Responsibility

Products and services do not simply passively exist in society—they actively create the social world we live in. In designing your AI system, you cannot simply ‘decide’ whether that system should affect society. If you ignore the social effects of your AI, you are implicitly affirming the status quo. While Lingua Franca does not espouse a specific perspective on social justice, it does recognize that many problems exist with the status quo of society. AI systems are likely to reaffirm the status quo, perhaps even more strongly than the human systems they replace, unless carefully designed end-to-end.

Design Questions

Identify and document ‘patterns’ that users think exist in your system.

Does your system tend to make certain kinds of decisions?

Does this tendency only occur for certain subsets of users?

How might users benefit from identifying patterns in your system?

Discuss how your system's biases and assumptions may vary over time.

If your system uses personalization, can users adjust the personalization when they disagree with it?

Does your system use any form of implicit clustering or collaborative filtering where users are grouped without their knowledge?

How might users’ tastes and preferences change over time?

Discuss your team's broad stance regarding harmful prejudices and biases that you do not want your system to have.

How would you respond to someone who accused your AI system of magnifying some kind of prejudice?

How might your system address such concerns?

Who in your organization should take responsiblity if discrimination emerges?

Considerations

Preference Bias

Systems that base decisions off of data gathered from other users’ decisions will mirror user biases.

Certain classes of algorithms (e.g. collaborative filtering) predict user preferences based on the preferences of ‘similar’ users. However, this data will naturally carry biases that may vary from offensive to innocuous. For example, users aged 30-35 may have the highest likelihood of buying diapers. A user aged 30-35 who is not a parent may see this recommendation and feel confused or even judged. Call our the potential for this bias in any high risk situation, and make sure that your team has considered various potential consequences.

Expressed and Revealed Preferences

Users’ expressed habits differ significantly from their revealed preferences, and recognize the bias in labeling data from one to inform the other.

Users express their preferences verbally or in writing, but reveal their preferences through empirical (e.g. purchase) data. It is tempting to use survey data to inform users’ purchases or vice-versa, but these datasets will often tell very different stories. Users often make decisions based on small pricing variations even though they will likely not admit it. Be careful of lowering your system’s effectiveness by taking data from different sources

Real vs Arbitrary Clusters

Sometimes an AI may cluster users into non-meaningful groups.

Sometimes, the clusters generated by an AI anchor on some spurious correlation, such as gender or zip code. Instead, clusters should be meaningful to your product, so that users can see relevant content to their interests. This requires determining whether the clusters generated by your AI are meaningful or simply represent one biased interpretation of the data.

Comparative Bias

Providing differently biased models or AIs can give users a natural context and make relationships more decipherable.

Users are good at spotting biases, and using their own intuitive judgment to gauge the ‘trustworthiness’ or ‘likely accuracy’ of a model over time. This is a uniquely human skill that can be combined with AI to create a more effective overall collaboration. This can be done by presenting the user with the results of several different models, usually models trained on different kinds of data, and then the user can interpret the aggregate results to make an overall decision.

Permission

Certain data is much better asked for, rather than inferred through indirect signals.

Users don’t always shy from providing data. In fact, it often helps to simply ask a user for a certain piece of data rather than try to indirectly infer it from weak signals. For example, a music app may be able to give you great recommendations simply by asking whether you play a certain instrument, since users with more experience in an instrument tend to appreciate different music made with that instrument. It would be much more difficult and potentially more intrusive to try and infer that information from a user’s listening habits.

Further Resources

How to Prevent Discriminatory Outcomes in Machine Learning by Global Future Council on Human Rights
The Perpetual Line-Up: Unregulated Police Face Recognition in America by Georgetown Law Center on Privacy & Technology

Footnotes

For the purposes of Lingua Franca, we are not concerned with the purely statistical notion of bias (of an estimator). By bias, we are referring to its common interpretation as the subjective experience of prejudice. ↩︎