AI Bias: A Problem Long Before the Algorithm

DATE

AUTHOR

The Language of Genomes

At the precipice of an AI-driven era, a wave of digital innovation could enhance every aspect of healthcare—from enhancing diagnostic accuracy to carrying out virtual consultations. But, fulfilling this promise hinges on the availability of high-quality and—crucially—diverse data.

When training datasets fail to represent certain populations—whether by race, gender, or geography—biases become embedded within algorithms. As these systems emerge from research into real-world settings, we see drops in performance; while a model seems to work well on average, it might perform less well for certain groups. For patients from underrepresented backgrounds, this might mean misdiagnoses or inappropriate medical care.

And, even if these failures are statistically rare, those affected are more than just data points—they're people whose health is at risk because of an inherently biased system. Just one patient suffering worse outcomes because of this flags deep-rooted concerns for AI's impact on global health equity.

When AI Misses the Mark: A Mammogram Study Pinpoints the Risks

A recent study in the European Journal of Cancer outlines this very real problem: researchers found that AI systems detect breast cancer less accurately in minority groups. And the difference isn't minor—it's stark. While the AI system achieved an 87% sensitivity rate for white women, this dropped to 75% for Black women and 72% for Hispanic women. The disparity was linked directly to underrepresentation in the training data, which predominantly consisted of mammograms from white patients.

This bias could have serious clinical consequences. Lower sensitivity means a higher likelihood of missed cancers and delayed diagnoses for already underserved populations. As mammography remains a frontline tool in breast cancer screening, these performance gaps sound the AI alarm bells with a critical question: Can these technologies be trusted for real-world deployment?

The Broader Pattern: Systemic Bias Across Clinical AI

Bias is a persistent and well-documented issue in AI applications across a range of clinical domains. In the context of cardiovascular disease and diabetes risk prediction, a recent study found that algorithms consistently underperformed for women and older adults.

The root of the problem—unsurprisingly—lay in data imbalances, as these groups were significantly underrepresented in the training datasets. In fact, the researchers found that closing the age gap in model performance would need up to 192% more data from older patients, and up to 57% more female data to match model performance between sexes.

Because diseases can present differently across demographics, a model that isn’t trained on diverse patient data won’t learn the subtle patterns that distinguish one group from another. It's like trying to learn a new language with only your native tongue as a guide. When trying to "speak the new language" (i.e., diagnose disease in women and older adults), the model "stammers" when deviating from the "native tongue" (the training data, mostly consisting of white males in clinical research).

The result? Models perform significantly worse than expected for those who might need it most.

Bias in AI models becomes especially visible when cultural and linguistic differences are ignored. For example, mental health tools trained on English-language and Western cultural norms fall short when applied to multicultural or multilingual settings.

And it's not just marginal differences—one AI model was found to be three times less accurate at diagnosing depression in Black patients compared to White patients. As research suggests that 76% of algorithms are trained on data from US patient cohorts, it's unsurprising that models tend to generalize poorly in unfamiliar contexts—particularly in Sub-Saharan Africa or Southeast Asia.

Years of Health Inequity are Bleeding into New Technologies

But the issue runs deeper than representation. A systematic review of large language models used for clinical decision-making underscores bias as "a pervasive problem" that could significantly compromise patient care.

Beyond the problem of data imbalances, something relatively surface-level that could be addressed with more representation, the paper highlights a much more intrinsic issue: By its very nature, the researchers write, medical data reflects the history of medical practice—a history populated by inequalities and disparities.

Recent research has brought these inherent problems to the limelight. One 2016 study, for example, found that 50% of medical students and residents endorse the false belief that Black patients have thicker skin than White patients, feeding into the myth that dark-skinned people have fewer nerve endings and are therefore less sensitive to pain. What's more, those who held this belief made less accurate treatment recommendations, their judgment swayed by a fundamentally flawed assumption.

These long-held biases are the "weeds" of clinical data—so entangled, it's hard to distinguish the "good" roots from the "bad." And, if researchers train an AI to prescribe pain treatment, it's this tainted data that teaches the system, "If the patient is Black, they don't need as strong a painkiller because they don't feel as much pain."

This leads to what’s known as label leakage, where the outcome labels (such as "this patient should receive treatment X") in training data reflect treatment decisions (ones that doctors have already made) rather than the underlying condition. The model may then learn that minimal care is appropriate in serious cases, reinforcing harmful clinical norms rather than correcting them.

A model's predictions can also become skewed by confounding variables—things that influence both the input and output, making it hard to understand what is truly driving an outcome. For example, it might appear that patients from certain neighborhoods have worse outcomes. But it’s not their geography causing poor health—it’s linked to unmeasured factors like poverty, pollution, or lack of access to care. If the model isn’t adjusted for those, it might treat postcode as a risk factor, leading to biased predictions.

So, while AI is expected to revolutionize healthcare, it also risks anchoring the field to decades of clinical bias—a dead weight dragging down what should be a pivotal era. As Dr Ted James noted in an article for Harvard Medical School, bias in AI is “not an AI problem per se, but a human one,” reflecting the flawed system it's built on. It's a system, he notes, that should "compel us to look deeper" at the very foundations of clinical practice. Without intervention, these biases could become embedded in digital infrastructure, locking in old injustices under the guise of innovation.

Worse still, the impact could ripple beyond health outcomes. If not addressed, AI biases could "poison" public and institutional perceptions of digital health innovations; patients and clinicians may become skeptical of tools if they are seen as inaccurate or unfair—pushing them away from new technologies that could potentially improve healthcare for all.

Hidden Bias in Trusted Medical Devices

And these problems aren't exclusively tied to "shiny" AI systems—even traditional medical devices, often seen as reliable or trustworthy, are affected. In 2021, experts warned that minority ethnic people were "at risk of poorer healthcare," after pulse oximeters were found to overestimate oxygen blood content in darker-skinned patients—something that was brought to light by their widespread use during the COVID-19 pandemic.

On the other side of the coin, spirometers—another tool thrust into the COVID-19 spotlight—often apply “race correction” formulas when measuring lung function. These formulas stem from decades-old studies that observed, on average, Black patients had 10–15% lower lung capacity than their White counterparts—a difference long attributed to biological variation over environmental or social factors.

But, while these formulas were grounded in scientific rationale at the time, they apply a sweeping assumption to an entire population—one that risks obscuring real, treatable conditions. If a Black patient has lower lung function, race correction might say, "That's normal for your race," and the issue ends there. But maybe they shouldn't have lower lung function—maybe there's an underlying issue that goes undiagnosed.

This underlines an important question: How can we ensure that the digital health wave tackles these injustices, rather than exacerbating them?

Efforts to Build More Equitable AI Systems

With this being said, several initiatives are trying to address the problem.

IQVIA has created a three-step framework to make sure its AI tools are fair. First, they run audits before deployment to check for bias in the training data and model results—looking specifically at race, gender, and age. If they find any issues, they apply solutions like rebalancing the data, using algorithms designed for fairness, and involving experts from different fields to improve equity. Once the model is in use, IQVIA builds in real-time monitoring to catch any new problems, like performance dips or emerging disparities. This whole system is backed by one of the world’s largest collections of real-world clinical data, helping the AI work better for all kinds of patients.
‍
IBM, meanwhile, has developed the AI Fairness 360 Toolkit. This open-source platform offers 70 ways of measuring bias and 10 ways of addressing it over different development stages, including data preparation, model training, and post-model testing. With educational resources, tutorials, and built-in workflows, IBM hopes this will equip data scientists and healthcare analysts to embed fairness directly into model development pipelines.
‍
Harvard Medical School is developing ethical frameworks to guide AI adoption in clinical care, emphasizing bioethics, equity, and patient empowerment. Its approach promotes transparency, inclusive model governance, and respect for diverse patient experiences. Recognizing that bias cannot be solved by technical fixes alone, Harvard calls for deeper reflection on data practices, decision-making power, and community inclusion in shaping AI tools for healthcare.

Foresight: The Largest National-Scale Initiative for AI in Healthcare

And, in the UK, research institutions are taking leaps towards beating bias. Just last week, University College London and King’s College London announced a “groundbreaking” initiative towards AI predictions in healthcare—and it’s all in the name of bigger and better data.

They will channel anonymized NHS records from 57 million people across England and Wales into the pre-trained AI model, “Foresight.” First published in a 2024 paper from the Lancet Digital Health, Foresight has been described as a medical ChatGPT. While it shares a similar foundation with popular large language models, its purpose is different. Rather than predicting the next word in a sentence, Foresight forecasts a patient’s clinical future—using past medical events to anticipate what might happen next.

To do so, it uses a combination of text—like ChatGPT—from doctors’ notes, as well as more solid data, like family history, that constructs a digital twin of the patient—not only flagging disease risk but letting doctors test different ‘what-ifs', such as how effective a treatment might be. It’s like looking into a crystal ball—but it doesn’t just show you one future, it lets you model different scenarios and tells you the most likely outcome.

Now, with over 10 billion health records for training, researchers hope that the model will make more powerful—and fairer—predictions. Dr Chris Tomlinson, Honorary Senior Research Fellow, University College London, told press:

“AI models are only as good as the data they are trained on. To benefit all patients, the AI must be trained on data that represents everyone.”

Data Equity Initiatives Leading by Example

The INSIGHT Eye Health Data Research Hub, part of the UK’s national Health Data Research initiative, exemplifies how representative datasets can strengthen algorithm development. By aggregating and anonymizing ophthalmic data from patients of varying ethnicities, ages, and socioeconomic backgrounds, INSIGHT enables the creation of AI tools that are both clinically sound and demographically inclusive.

The hub focuses on diseases like diabetic retinopathy, glaucoma, and macular degeneration, all of which disproportionately impact certain minority groups. Working with Moorfields Eye Hospital—one of the globe's leading ophthalmology institutions—the initiative ensures these tools are validated across the populations they intend to serve.

The Algorithmic Justice League (AJL), founded by Joy Buolamwini, blends technical research, policy advocacy, and creative storytelling to highlight algorithmic bias. AJL’s work has exposed serious inaccuracies in facial recognition systems, which often misidentify women and people of color, prompting major companies and institutions to reevaluate their AI governance.

In the US, the AHRQ and NIMHD have issued fairness guidelines that promote transparency in data sourcing, inclusive development teams, stakeholder consultation, and demographic performance evaluations. These principles mark a significant step toward framing algorithmic bias as a public health concern.

‍Accuray, a radiotherapy technology firm, conducts demographic audits of its AI models to account for anatomical differences across populations, ensuring treatment plans are safe and effective for all patients.

‍Deloitte takes a broader approach, advising healthcare providers to institutionalize equity across hiring, data governance, and community engagement. Their framework aims to embed fairness into both AI solutions and the structures that support them, promoting trust and accountability throughout the healthcare ecosystem.

Conclusion

As AI continues to transform diagnostics, treatment pathways, and access to care, it must be held to the highest standards of equity. True progress demands that data diversity and fairness are not just considerations, but foundational principles—woven into every stage of development, from initial design to real-world deployment.

Equity cannot be retrofitted. It must be intentionally built into the system from the outset. This will require structural reform, sustained cross-sector collaboration, and an unwavering commitment to inclusivity. Only then can healthcare AI truly fulfill its promise—delivering care that is not only innovative and precise, but fair, trustworthy, and just for all.

Expert Insight on the Future of Healthcare

with world-renowned experts