Researchers Demonstrate Capabilities of Machine Learning to Identify Cancer Risk Factors

March 11, 2021

Shannon Lynch, PhD, MPH, co-author of the study and an assistant professor in the Cancer Prevention and Control ProgramShannon Lynch, PhD, MPH, co-author of the study and an assistant professor in the Cancer Prevention and Control Program

PHILADELPHIA (March 11, 2021)—Scientists have learned that the socioeconomic circumstances of a neighborhood, including housing and employment, are related to cancer risk and outcomes. But to date, only a handful of measures, such as poverty, have been analyzed.

A new study by Fox Chase Cancer Center scientists has shown how machine learning can mine large volumes of social and environmental data to potentially identify new neighborhood risk factors related to cancer. Machine learning is a way that computer algorithms can be used to analyze large volumes of data.

“We have a general idea that living in neighborhoods with lower socioeconomic conditions is often associated with poor health outcomes, but in terms of how to measure that, there isn’t a consensus,” said Elizabeth Handorf, PhD, an associate professor in the Biostatistics and Bioinformatics Facility at Fox Chase. “We’re trying to find an objective way to determine what the most helpful measures are to identify people who are at higher risk for cancer because of their environment.”

Elizabeth Handorf, PhD, co-author of the study and an associate professor in the Biostatistics and Bioinformatics FacilityElizabeth Handorf, PhD, co-author of the study and an associate professor in the Biostatistics and Bioinformatics Facility

Handorf co-authored the study with Shannon Lynch, PhD, MPH, an assistant professor in the Cancer Prevention and Control Program. The research team tested different popular machine learning methods to determine which worked best to analyze the links between socioeconomic data and cancer risk.

They linked prostate cancer patients from the Pennsylvania Cancer Registry to the socioeconomic circumstances of the neighborhood they lived in. The comprehensive study included more than 14,000 neighborhood variables related to housing quality, education level, median household income, marital status, and renting versus owning a home. The goal was to identify which variables were associated with a diagnosis of advanced prostate cancer.

Census data has previously been used to study disease risk, but data sets are so large that scientists rarely use all possible variables. “In the data set we worked with, there are tens of thousands of variables to investigate,” Handorf said. “Prior studies tended to only select a handful of these variables based on their assumptions about how social factors might be impacting health, and different research studies would choose different variables. We were missing an objective way to select the best factors.”

Of the different models they tested, Handorf and her colleagues found that a method called penalized regression or a “lasso” model worked the best at identifying key variables and eliminating false positives. This is a type of regression analysis that assigns a penalty to the estimate of each variable’s effect.  It can automatically pick the variables that best predict an outcome. “The biggest takeaway was how well the lasso-type regressions worked,” she said.

Handorf added that the findings demonstrated that machine learning can be used to identify which variables in a large set of socio-environmental data have a measurable effect on health outcomes in cancer and other diseases. She said she hopes the study will help scientists select variables more objectively. “Considering the social environment is important, but you have to think about how to measure that well.”

Fox Chase Cancer Center (Fox Chase), which includes the Institute for Cancer Research and the American Oncologic Hospital and is a part of Temple Health, is one of the leading comprehensive cancer centers in the United States. Founded in 1904 in Philadelphia as one of the nation’s first cancer hospitals, Fox Chase was also among the first institutions to be designated a National Cancer Institute Comprehensive Cancer Center in 1974. Fox Chase is also one of just 10 members of the Alliance of Dedicated Cancer Centers. Fox Chase researchers have won the highest awards in their fields, including two Nobel Prizes. Fox Chase physicians are also routinely recognized in national rankings, and the Center’s nursing program has received the Magnet recognition for excellence five consecutive times. Today, Fox Chase conducts a broad array of nationally competitive basic, translational, and clinical research, with special programs in cancer prevention, detection, survivorship, and community outreach. It is the policy of Fox Chase Cancer Center that there shall be no exclusion from, or participation in, and no one denied the benefits of, the delivery of quality medical care on the basis of race, ethnicity, religion, sexual orientation, gender, gender identity/expression, disability, age, ancestry, color, national origin, physical ability, level of education, or source of payment.


For more information, call 888-369-2427

Connect with Fox Chase