Machine Learning Predicts Cancer Origins for Precision Treatment
Written by Susi, Arushi Sharma
Machine learning is ushering in a new era in cancer treatment, one characterized by precision and personalization. This transformative technology is now being used to predict the origins of cancer, providing invaluable insights into the disease's root causes.
MIT and Dana-Farber Cancer Institute researchers have developed a machine learning-based approach for predicting the origin of enigmatic cancers by analyzing the sequences of approximately 400 genes.
This novel method correctly classified more than 40% of unidentified tumors in a dataset of about 900 patients, potentially increasing the number of patients eligible for targeted treatments based on the origin of their cancer by 2.2 times.
“That was the most important finding in our paper, that this model could be potentially used to aid treatment decisions, guiding doctors toward personalized treatments for patients with cancers of unknown primary origin,” says Intae Moon, an MIT graduate student in electrical engineering and computer science who is the lead author of the new study.
The senior author of the paper published today in Nature Medicine is Alexander Gusev, an associate professor of medicine at Harvard Medical School and the Dana-Farber Cancer Institute.
It is difficult to determine the origin of cancer in 3 to 5% of cases, especially when tumors have metastasized. These are known as cancers of unknown primary (CUP). This lack of information frequently impedes the prescription of precision drugs, which are typically tailored for specific cancer types with known efficacy, resulting in CUP patients receiving broader treatments with more side effects.
“A sizeable number of individuals develop these cancers of unknown primary every year, and because most therapies are approved in a site-specific way, where you have to know the primary site to deploy them, they have very limited treatment options,” Gusev says.
Moon, affiliated with the Computer Science and Artificial Intelligence Laboratory and co-advised by Gusev, leveraged routinely collected genetic data at Dana-Farber to predict cancer types.
Using genetic sequences from 400 cancer-related genes and training on 30,000 patient data points across 22 known cancer types, they created the OncoNPC machine-learning model.
Testing it on 7,000 tumors with known origins yielded 80% accuracy, rising to 95% for high-confidence predictions. When applied to 900 CUP tumors, the model predicted origins for 40%.
Its predictions aligned well with germline mutations and survival outcomes, showing promise for guiding targeted treatments in 2.2 times more patients.
“That potentially makes these findings more clinically actionable because we’re not requiring a new drug to be approved. What we’re saying is that this population can now be eligible for precision treatments that already exist,” Gusev says.
Researchers aim to enhance their model by incorporating pathology and radiology images, enabling more comprehensive tumor predictions and optimal treatment guidance. Funding sources include the National Institutes of Health, Louis B. Mayer Foundation, Doris Duke Charitable Foundation, Phi Beta Psi Sorority, and Emerson Collective.