The convergence of two engineering disciplines at the University of Florida is building a valuable tool that will allow researchers to diagnose human cancers, potentially leading to earlier clinical interventions.
Kiley Graim, Ph.D., assistant professor in the Department of Computer & Information Science & Engineering, is leading a $1.5 million National Institutes of Health (NIH) National Cancer Institute study with co-investigator James Cahill, Ph.D., an assistant instructional professor in the Department of Environmental Engineering Sciences. Together with Ji-Hyun Lee, DrPH, professor in the Department of Biostatistics, their grant seeks to create a pan-mammalian tumor atlas that will aid in the development of artificial intelligence (AI) tools to model the evolution of cancer across hundreds of mammalian species — and its correlation to humans.
As is always the case with creating reliable AI, it demands huge amounts of clinical data to become a viable tool. Particularly where the cancers are a rare variety, a sufficiency of samples is rarer still.
“We’re not getting enough people in clinical trials because their diseases are too rare, and people aren’t lining up as donors of healthy bone tissue,” Dr. Graim said. “That prevents us from getting traction from machine learning, which requires big data. We might only have a few hundred samples and a few billion base pairs in each of their genomes. Building a model from those leaves far too many variables; I have more genes that are going to change in each patient than I have patients.”
She observed that dogs are 75 times more likely to get osteosarcoma than humans — and UF gets tons of cases coming through every year, yielding valuable data. “I started looking at other species because we can compare bone cancer in mice, dogs and humans,” Dr. Graim said. “They’re very closely related and we can look at the ancestral networks — how the genes interact with each other. It works really well.”
A novel search for big data to feed machine learning
Seeking a sufficient clinical data set, Dr. Graim recruited the evolutionary biology and comparative genomics expertise of Dr. Cahill, pulling in millions of years of evolution data to see the commonalities in various species that develop cancer and those that don’t.
“We’re looking at all cancers, but we’re trying to identify good model organisms for rare cancers, as the data set from which to draw those is so much smaller,” Dr. Cahill said. “We’re looking at a wide array of mammals, quantifying their relationship to each other in a complex way. The machine-learning models can leverage both close- and dissimilarly related species so there’s a hierarchy of the different parts of the genomes and how similar they are across all of those.”
Studies have indicated that breast cancer in dogs and humans is very similar, with the same genome subtypes which respond to the same treatments. While most cancer research is done either on humans or mice, this NIH R01 grant, specific to early-stage investigators, is allowing the UF team to look at several hundred different species at the same time. Dr. Graim noted that this approach is better because of the number of samples it produces, and because they are closely related species for which their team has done a lot of work to understand the genomic difference between them.
HiPerGator makes shorter work of gene analysis
“We couldn’t do this just a few years ago because we didn’t have enough diversity,” Dr. Graim said. “Now, we’ve got more than 200 mammalian species. But mapping the genomes is incredibly computationally complex. And I’m taking that data and developing AI/machine learning tools to analyze it. Neither of those tasks would be conceivable without the power of UF’s HiPerGator supercomputer.”
The expression levels in the genes once they are sequenced reveal the complex correlations that identify which genes are responsible for which tumor type. Likewise, they can indicate the efficacy of cancer treatments in humans, where those interventions have been successful in other species with similar genomic signatures.
“One of the big questions of evolutionary biology is, how exactly are animals related,” Dr. Graim said. “Evolutionary analysis provides a strong foundation for understanding species-specific cancer differences, and our AI models will leverage that to learn how small changes in human cancer patients’ genomes affects how their cancer spreads and responds to treatment. We are treating cancer completely differently than we did just 10 years ago. Our hope is to create a diagnostic with a predictive marker that will allow the doctor to proactively treat a cancer that previously avoided detection until it was too late.”
Shawn Jenkins
Herbert Wertheim College of Engineering