The ultimate goal of artificial intelligence is to create technology that will allow computers and machines to mimic the human brain and exhibit human-like behavior. Although there’s a long way until current AI will become similar or equivalent to human intelligence, there is, nevertheless, a certain characteristic that machine learning has already picked up from us. Being biased.
What exactly is bias in AI?
Being biased means displaying a tendency to lean in a certain direction, either in favor of or against a particular thing, without having a neutral point of view on that aspect.
In AI, especially in machine learning, this means that the process makes erroneous assumptions due to the limitations of a dataset. More specifically, the data we fed into a ML model can contain human interpretation and cognitive assessments that will also influence the result. For instance, a ML model used in human resources to screen resumes might be inappropriately filtering the candidates based on attributes such as race, color, marital status, etc.
Some of the common features that might result in bias include:
- National origin
- Marital status
- Sexual orientation
- Education background
- Source of income
How can an algorithm become biased?
Machine learning is the subset of artificial intelligence that allows computers to improve themselves over time as a result of experience and practice. A machine learning algorithm represents a set of programming commands that help a regular computer learn how to solve problems with AI. The algorithm and the data used are the most important components of a ML model.
According to IBM Research, there are two main ways artificial intelligence inherits biases. First of all, artificial intelligence is a technology, which can have errors and dependencies. This means that if an algorithm or a tool under AI’s umbrella has defects, the entire model and its results will display the same flaws.
Second of all, most machine learning models depend on historical data to learn to perform certain tasks. These tasks are completed correctly by AI after millions of datasets that have been pre-labelled with the correct answer. Sadly, these data points come from real uses cases, which are biased. This results in societal biases being present in the datasets AI systems are trained on. Any preferences in the dataset will eventually become part of the AI preferences. Algorithms that are written by programmers are as partial and deterministic as the data they are fed. And, as history shows, the data is rarely partial.
Examples of AI bias
In 2015, software developer Jacky Alcine pointed out on Twitter that Google’s facial recognition algorithm has labeled him and his friends as ‘gorillas’. Google apologized for the mistake and promised to address the issue. In this case, the machine learning algorithm in the Google Photo software was not trained properly and the data it was fed was insufficient. Beyond being inadequately labeled in a photo, the consequence of this machine bias could be a more ingrained race discrepancy in all areas.
A similar phenomenon happened in 2016, when a computer program used by a US court for risk assessment wrongly flagged darker-skin defendants at almost twice the rate of white defendants. This happened because the algorithm underestimated the likelihood of white defendants being repeat offenders.
A less problematic case of bias was registered in 2012 when it was pointed out that Apple’s AI assistant could not understand Scottish accents. The cause of this was a lack of diversity of accents in the early datasets that were used to train the NLP algorithm.
These are just a few AI bias cases that have been registered in the last couple of years. And, as artificial intelligence is witnessing considerable adoption into the real world, this AI systems bias is expected to increase.
How can bias in AI be addressed?
Even though bias in AI is widespread and it seems there is no silver bullet that can completely eliminate it, there are certain things we can do to tackle this issue. Researchers are looking into ways to reduce bias and strengthen ethics in rule-based artificial systems and currently, there are a few best practices that can be followed.
It’s all about the data – make sure you choose a representative dataset
The first and most important step when writing an algorithm is to make sure the dataset is representative for what you need. This means choosing data that is diverse and includes different groups to prevent your model from having trouble identifying unlabeled examples that are outside the norm. Make sure you have properly grouped and managed the data so you aren't be forced to face similar situations as Google and its facial recognition system. The source of your data is also really important. If there is sourcing bias it will be more difficult to prevent issues.
Using the right model is key
Depending on the goal, each AI system requires a specific model. Having so many options can sometimes be confusing and you might be tempted to use the same model in different situations. But this might lead to errors since data resources are different so different solutions are expected.
As an example, let's consider supervised and unsupervised learning. The latter, while easier to train, is seen as being more prone to errors if the dataset has inconsistencies. On the other hand, the former can prove more accurate, but also more susceptible to gain bias from the dataset.
Take feedback and learn
Monitoring and reviewing any AI-based application should be mandatory. Even though identifying issues later on in the implementation makes it more challenging to handle them, it is still important to acknowledge the bias and address it.
Try, as much as possible, to understand how much bias exists in your algorithm and when you discover unexpected biases, ensure that they are explained and fixed.
Removing bias from AI will not happen easily, especially since part if it is carried on by individuals. However, people as well as governments (take New York City’s attempt on regulating algorithms) are taking measures to resolve this problem. Machine learning models need to be permanently validated and tested for bias.
In the end, we leave you with a brief, yet interesting TEDx presentation on keeping human biases out of AI.