The potential of AI in advancing genomics an interview with Avantika Lal

Drug discovery & development

Medical imaging and biomedical diagnostics

Multi-omics research

Opinion

DATE

March 24, 2023

AUTHOR

Avantika Lal (Insitro)

The Language of Genomes

In this interview we speak with Avantika Lal from the pharmaceutical start-up Insitro (CA, USA) about how AI can be applied to genomic data from patients in order to help us better understand the molecular processes underlying complex human diseases. Lal also explains how she identifies drug targets from single-cell genomic data by utilizing deep learning and how she would like to see this area of AI advance over the next few years.

What sparked your interest in AI in genomics and how has the field progressed since?

Human cells have a complex, multifaceted set of mechanisms to control the activation and inactivation of the genes in our genome. The interplay between the sequence of our genome and these regulatory mechanisms acts to maintain the state of each cell, control the development of the human body, and respond to changes in the environment such as infection or drugs.

Disease frequently upsets these regulatory mechanisms. For example, mutations in the genome may lead to disease by preventing genes from being activated or inactivated at the right time. But in order to understand this regulatory network and develop effective and personalized treatments for such diseases, we need computational systems that can integrate massive, complex sequencing datasets. When I became interested in this problem, AI had begun to revolutionize other domains based on complex datasets, such as imaging, language and video analysis, and I wondered how similar strategies could be adapted to genomics.

Since then, many developments in modern AI have indeed been adapted to genomics, resulting in remarkable successes across the genomics industry, ranging from primary analysis (improving the accuracy of sequencing) to secondary analysis (improving the identification of genomic features, such as mutations) and lately to tertiary analysis (interpreting genomic data to model biological phenomena such as disease). This has been driven both by computational advances as well as by advances in experimental technology, allowing us to generate more data at a lower cost. Today, AI shows promise to be able to connect a patient’s genotype to their phenotype, i.e., to predict how changes in the genome will lead to changes in the behavior of a human cell or the human body as a whole.

How do you identify novel drug targets from single-cell genomic data utilizing deep learning in your work?

Thanks to growth in sequencing data, thousands of genomic variants (individual differences in the sequence of our genomes) have been correlated with various diseases, such as heart disease, Alzheimer’s or Parkinson’s. However, it is difficult to precisely identify which variants are actually causal for the disease. Further, the vast majority of such variants are regulatory variants that act by changing the activity of genes in a specific context—from the genome sequence alone, we can rarely identify which gene they regulate, in which type of cell, or at what stage of development—in other words, the mechanism by which they cause disease.

This problem is easy to understand if we think of the human genome as a book written in an unknown language. Without understanding the language, it is difficult to know whether a genetic variant—a change in a single letter somewhere in the middle of the book—would significantly change the book’s meaning or not. But once we understand the language, we would understand what effect a single change might have on the biological function of the genome, and thereby on our health.

In my research, I train models that can understand the relationships and structure of sequence motifs in the genome, in the same way as deep learning models can understand the relationships between words and sentences in natural language. Such models can predict the effect of sequence changes—e.g., whether a change from A to G would result in altered binding of a protein that promotes gene expression and therefore reduce the expression of a critical gene. By training models on data from single-cell genomics, we can predict the effect of the variant in specific cell types, conditions and time points—e.g., whether the variant will affect the function of astrocytes in the developing brain, but not of neurons.

All this adds up to producing novel, testable hypotheses linking specific genes, via specific, testable mechanisms, to a disease. These hypotheses can be tested experimentally to confirm novel drug targets.

Have there been any recent developments in this field that you have found particularly exciting?

The last year or two have seen several exciting developments that pushed the boundaries of this field. In 2021, DeepMind published a paper describing Enformer, a model that can predict the impact of genomic variants on the expression of genes located as far as 100,000 base pairs away. Enformer is an early success for the application of transformer models to the genome, and there is hope that these architectures will prove successful in genomics as they have in other domains.

Several recent papers have also shown success using encoder-decoder models to predict how a cell will respond to changes in its environment, including drugs and genetic changes. Such models can predict the response of a cell to perturbations that have not been experimentally tested, as well as combinations of perturbations. These directions of research have the potential to reduce the ballooning cost of drug discovery and help us find effective candidates sooner.

What are the main challenges to research into AI in genomics and how can we overcome them?

Data is still a powerful limiting factor in applying AI to genomics. As more human genomes are sequenced around the world and technologies such as single-cell and spatial genomics continue to improve, the amount of data should increase, but this needs to be coupled with improvement in the accessibility of data to researchers around the world. Publication standards that ensure reproducibility of papers and sharing of data and models will also accelerate development in the field.

Finally, in biology and medicine, it is often critically important not only to develop a highly accurate model, but also to understand the biological mechanisms and patterns that the model has learned. Instead of treating deep learning models as a black box, better tools for model interpretation are needed to help us understand disease biology.

What advice would you give to young scientists interested in pursuing research into AI in genomics?

With the rapid growth in the depth and quantity of data available, the growth of AI in genomics is just getting started and it’s an excellent time for young scientists to join the field and make an impact. I would urge even biological scientists without a background in programming to explore this field, as there are many online courses in machine learning that help make it accessible to everyone.

Interviewee profile:

My name is Dr. Avantika Lal. I work as a Senior Data Scientist at Insitro, which is a pharmaceutical startup. Insitro was founded by machine learning pioneer Daphne Koller in 2018. The premise behind Insitro’s approach is that many challenges in drug discovery and development could be ameliorated if only we could predict earlier in the process which drugs are likely to work and for which patients. Therefore, we are applying machine learning throughout the pharmaceutical value chain to enable better predictions and bring new medicines to patients who need them.

My research interests focus on applying AI to genomic data from patients, in order to understand the molecular processes underlying complex human diseases. After completing a PhD in genetics in India, I moved to the USA to take up a postdoctoral fellowship at Stanford University (CA, USA), where I developed successful machine learning models to predict cancer patients’ response to treatment based on genomic data from their tumors. Later, I became a Senior Scientist at NVIDIA, (CA, USA) where I developed tools based on deep learning and accelerated computing to improve the accuracy of genomic data, enabling more accurate biomedical analyses. ‍

The opinions expressed in this feature are those of the interviewee/author and do not necessarily reflect the views of Future Medicine AI Hub or Future Science Group.

Expert Insight on the Future of Healthcare

with world-renowned experts