Artificial intelligence and cognitive algorithms have become some of the most disruptive and impactful technology-driven changes in the modern world. AI is increasingly supporting important decisions in every business with insights from data-driven machine learning models. Hence, it is extremely important to ensure that the insights from these algorithms work in an inclusive and unbiased way.
Unlike technical errors in systems, problems with AI-related bias are not clearly visible to the eyes, which makes it more challenging to identify. Without a meticulously governed approach, AI algorithm might implant unconscious bias in decision-making and it might affect us in unfathomable ways. Strong vigilance and governance starts with awareness. It is very important to know the types and sources of AI biases to combat these with the right measures.
What is AI bias?
AI and machine learning algorithms observe trends in data and form associations through pattern recognition and then use the established patterns to solve complex problems. Some of the most common application areas of these models are - loan approvals, target marketing, talent evaluation etc. In traditional programing, the developer usually hard-codes as a solution to get a definitive solution, but machine learning algorithms learn the patterns from data. As the algorithms gets trained by data and data scientists, the decision it recommends are often affected by the inherent bias from the data itself and the individuals who are training the algorithms.
Related article - Are AI solutions being influenced by our own biases?
Bias in AI: Types and solutions
Sample / Selection Bias
This occurs when the collected data does not represent the whole population. A common example of this can be a voice recognition based system failing to understand certain language accents due to the training sample not being representative enough. Another example can be a machine-learning model for scanning résumés or university applications mistakenly screening out certain applicants with specific subject combinations if the historical data never reflected the same combinations.
Solution – As these types of biases primarily come from the training data sets, most effective way to solve this is to make the training data as inclusive as possible. A strong governance process during data collection and training stage that ensures most of the possible scenarios are considered should help in reducing this bias. Frequent updates of training data sets are also important to keep it time-relevant and include new possibilities.
Related article - The need for AI to sense, think, respond and learn without bias
Interaction Bias
When a system is trained using streaming data based on a live interaction of a group, it instils the bias that exists in the group. As experimented by Google, when they asked a group of people to draw shoes for the computer, most people drew one that they were more familiar with, resulting in the computer not recognizing some specific types of shoes. Interaction bias sometimes leads to serious consequences like the Microsoft chat-bot picking up offensive language based on streaming tweets.
Solution – Interaction bias is not easy to detect as it comes from streaming interaction. It is important to exercise caution for such training process and apply constant checks on algorithms and its output.
Implicit / Latent bias
This occurs when assumptions made based on one's own mental models and personal experiences influences algorithm development that do not necessarily apply more generally. As an example, prejudice against candidates from specific localities and nationalities might result in lower credit scores for them due to latent bias of the individuals developing the model. Similarly, opportunities of specific racial and demographic segments can be downgraded in an automated resume selection process due to similar prejudices. In many cases, persistent latent bias leads to sample / selection bias as it historically deters use of inclusive data points in the model.
Solution – A balanced and diverse team bringing in different thoughts, approaches and background helps to push for holistic and unbiased data samples driving a more robust solution design. In addition, a strong and ethical governing committee can ensure close vigilance to constantly rule out any possible exclusions inflicted by individual ideas.
Measurement Bias
Measurement bias is the outcome of unwanted noise resulting in faulty measurement. Most of the time it results in systematic distortion of data. The distortion could be the fault of a device or consistent noise getting into measurement process. A camera with a chromatic filter will generate images with a consistent chromatic bias. A faulty temperature measurement system for gas turbine will systematically distort temperature to fault relationship. Another possible source for measurement bias can be the method of measurement. In case of survey analysis, random errors are caused by unintended mistakes by responders or interviewers during collection of data.
Solution - To avoid measurement bias, the data inputs coming from any device should be validated properly comparing outputs from multiple measuring devices in different times to ensure the measurement is accurate. It is also important to ensure a consistent method and environment during measurement process to rule out any random or systematic noises introduced by individuals or the environment. Training the team measuring and labeling the data is very important to avoid any human led measurement bias.
Related article - Testing of AI/ML based systems
Tackling bias in AI systems
Any conscious or unconscious bias can influence the self-learning AI system and affect decisions in unpredicted ways leading to unfavorable outcomes. To avoid this, it is essential to plan for prevention techniques and human judgement in unison. The need is to push for robust statistics, large and inclusive data samples, and algorithm performance indicators to safeguard the solution technically. Critical checkpoints should be applied and exercised closer to the data sources, devices, and resources who are involved in developing the algorithms. The onus is on Strategic Governance to ensure that insights using data-driven algorithms are robust and free from unwanted bias and exclusions.
Related article - How to address shortcomings in AI to build trust
Industry :
Satavisha Mukherjee
Partner, Analytics & AI Consulting, Wipro
Satavisha is a Data Consumption Strategy Consultant based out of Dubai, UAE with 15 years of experience in the decision science and predictive analytics domain. She specializes in use cases consulting, data consumption strategy, and data science use cases delivery across domains. She has worked extensively in customer analytics and supply chain analytics domains across industries like retail, banking, pharma, manufacturing, aviation and utilities.