Graduate student Ram Joshi analyzed data to predict the most common factors that result in a Type 2 diabetes diagnosis.
“I earned my master's degree in statistics from Texas Tech,” he said, “so I like to analyze data.”
Joshi's love of data analysis led him to conduct research using an existing dataset on the Pima tribe of Native Americans. The Pima live near Phoenix, Arizona, and they have a relatively high prevalence of Type 2 diabetes.
“I chose this dataset to analyze the risk factors and causes,” Joshi said. “Why is diabetes higher in this population group and not others? Based on the dataset, I wanted to know what the reason and factors could be. I found there are some variables that highly contribute to the occurrence of diabetes in the Pima.
“To improve the understanding of risk factors, we predicted Type 2 diabetes for Pima women utilizing a logistic regression model and decision tree – in other words, a machine learning algorithm. Our analysis found five main predictors of Type 2 diabetes: glucose, pregnancy, body mass index (BMI), diabetes pedigree function and age.”
Joshi and his co-author, Chandra Dhakal from the University of Georgia, recently had their findings, “Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches,” published in the International Journal of Environmental Research and Public Health.
“Our preferred specification yields a prediction accuracy of 78.26% and a cross-validation error rate of 21.74%,” Joshi said. “We argue that our model can be applied to make a reasonable prediction of Type 2 diabetes and could potentially be used to complement existing preventive measures to curb the incidence of diabetes and reduce associated costs.”
Joshi notes that this kind of research also is applicable to economics.
“Economics is not that narrow of a field now,” he said. “Economics has a wide range of scope and applications. Basically, everything is related to economics these days. If you look at the cost, the expense that a country accrues in one year just due to diabetes is significant. In 2017, the U.S. spent around $327 billion in the diagnosis and treatment of diabetes, and this has been a rising trend.”
While Joshi is satisfied with the outcome of his research based on the dataset he used, he said more data needs to be analyzed to enhance his predictions.
“If I had data on the other variables, like smoking, obesity, physical activity, food habits and those kinds of things about the community, it would help me to better estimate the exact predictors for the causation of the diabetes,” Joshi said. “Unfortunately, we don't have any of that data available. But, in the future, if that is available and I get enough support funding, then I'm very much interested in doing some more work on this area.”