AI model to predict drugrelated birth defects

AI model to predict drugrelated birth defects

DATE
July 27, 2023
SHARE
The Language of Genomes

Researchers from the Icahn School of Medicine developed an AI ‘knowledge graph’ to better predict birth defects associated with certain medications and pre-clinical compounds. An AI model developed by data scientists from the Icahn School of Medicine at Mount Sinai (NY, USA) has the potential to improve predictions about certain existing medications that are not currently considered harmful but might be linked to congenital disabilities. The "knowledge graph" model, termed ReproTox-KG, also possesses the capability to predict the impact of pre-clinical compounds that could pose risks to the developing fetus. This groundbreaking study represents the first of its kind to utilize knowledge graphs, integrating diverse data types to explore the underlying factors behind congenital disabilities. Approximately 1 in 33 births in the United States are affected by birth defects, which can manifest as functional or structural abnormalities and are thought to be influenced by diverse factors, including genetics. Despite ongoing research, the root causes of many of these disabilities remain unknown. It is known that certain substances present in medications, cosmetics, food and environmental pollutants have the potential to cause birth defects when exposed to the fetus during pregnancy. Avi Ma’ayan, professor of Pharmacological Sciences and Director of the Mount Sinai Center for Bioinformatics at Icahn Mount Sinai, stated, “We wanted to improve our understanding of reproductive health and fetal development, and importantly, warn about the potential of new drugs to cause birth defects before these drugs are widely marketed and distributed. Although identifying the underlying causes is a complicated task, we offer hope that through complex data analysis like this that integrates evidence from multiple sources, we will be able, in some cases, to better predict, regulate, and protect against the significant harm that congenital disabilities could cause.” The researchers accumulated information from multiple datasets concerning associations between birth defects, as documented in published studies, including data from NIH Common Fund programs. Through this demonstration, they showcased the power of integrating information from diverse resources, which can lead to synergistic discoveries. Specifically, the combined data encompasses knowledge about the genetics of reproductive health, the categorization of medicines according to their pregnancy-related risks, and the impact of drugs and pre-clinical compounds on the biological mechanisms within human cells. The data covered a range of specific information, such as studies on genetic associations, alterations in gene expression induced by drugs and preclinical compounds in cell lines, known drug targets, genetic burden scores for human genes, and placental crossing scores for small molecule drugs. Significantly, the research team utilized ReproTox-KG and employed semi-supervised learning (SSL) to prioritize 30,000 preclinical small molecule drugs based on their potential to cross the placenta and cause birth defects. SSL, a branch of machine learning, utilizes a small set of labeled data to guide predictions for much larger unlabeled datasets. Additionally, through an analysis of the ReproTox-KG's topology, the researchers identified over 500 birth-defect/gene/drug cliques. These cliques offer insights into the molecular mechanisms underlying drug-induced birth defects. In graph theory terms, cliques are subsets of a graph where all the clique nodes are directly connected. However, it is essential to note that the study's findings are preliminary, and further experiments are necessary to validate and verify the results. Professor Ma’ayan further stated, “We hope that our collaborative work will lead to a new global framework to assess potential toxicity for new drugs and explain the biological mechanisms by which some drugs, known to cause birth defects, may operate. It’s possible that at some point in the future, regulatory agencies such as the US Food and Drug Administration and the US Environmental Protection Agency may use this approach to evaluate the risk of new drugs or other chemical application.” Following this study, the investigators intend to employ a similar graph-based methodology for other projects that concentrate on exploring the connections between genes, drugs, and diseases. Additionally, they are looking to utilize the processed dataset as educational materials for bioinformatics analysis courses and workshops. Furthermore, their plans involve expanding the scope of the research to incorporate more intricate data, such as gene expression profiles from specific tissues and cell types, collected at multiple developmental stages. This extended approach aims to gain deeper insights into the mechanisms involved in birth defects and provide a more comprehensive understanding of the underlying biological processes.