What is the best predictor or most major risk factor of type 2 diabetes?

Abstract

Background:

Type 2 diabetes [T2D] accounts for ~90% of all cases of diabetes, resulting in an estimated 6.7 million deaths in 2021, according to the International Diabetes Federation. Early detection of patients with high risk of developing T2D can reduce the incidence of the disease through a change in lifestyle, diet, or medication. Since populations of lower socio-demographic status are more susceptible to T2D and might have limited resources or access to sophisticated computational resources, there is a need for accurate yet accessible prediction models.

Methods:

In this study, we analyzed data from 44,709 nondiabetic UK Biobank participants aged 40–69, predicting the risk of T2D onset within a selected time frame [mean of 7.3 years with an SD of 2.3 years]. We started with 798 features that we identified as potential predictors for T2D onset. We first analyzed the data using gradient boosting decision trees, survival analysis, and logistic regression methods. We devised one nonlaboratory model accessible to the general population and one more precise yet simple model that utilizes laboratory tests. We simplified both models to an accessible scorecard form, tested the models on normoglycemic and prediabetes subcohorts, and compared the results to the results of the general cohort. We established the nonlaboratory model using the following covariates: sex, age, weight, height, waist size, hip circumference, waist-to-hip ratio, and body mass index. For the laboratory model, we used age and sex together with four common blood tests: high-density lipoprotein [HDL], gamma-glutamyl transferase, glycated hemoglobin, and triglycerides. As an external validation dataset, we used the electronic medical record database of Clalit Health Services.

Results:

The nonlaboratory scorecard model achieved an area under the receiver operating curve [auROC] of 0.81 [95% confidence interval [CI] 0.77–0.84] and an odds ratio [OR] between the upper and fifth prevalence deciles of 17.2 [95% CI 5–66]. Using this model, we classified three risk groups, a group with 1% [0.8–1%], 5% [3–6%], and the third group with a 9% [7–12%] risk of developing T2D. We further analyzed the contribution of the laboratory-based model and devised a blood test model based on age, sex, and the four common blood tests noted above. In this scorecard model, we included age, sex, glycated hemoglobin [HbA1c%], gamma glutamyl-transferase, triglycerides, and HDL cholesterol. Using this model, we achieved an auROC of 0.87 [95% CI 0.85–0.90] and a deciles' OR of ×48 [95% CI 12–109]. Using this model, we classified the cohort into four risk groups with the following risks: 0.5% [0.4–7%]; 3% [2–4%]; 10% [8–12%]; and a high-risk group of 23% [10–37%] of developing T2D. When applying the blood tests model using the external validation cohort [Clalit], we achieved an auROC of 0.75 [95% CI 0.74–0.75]. We analyzed several additional comprehensive models, which included genotyping data and other environmental factors. We found that these models did not provide cost-efficient benefits over the four blood test model. The commonly used German Diabetes Risk Score [GDRS] and Finnish Diabetes Risk Score [FINDRISC] models, trained using our data, achieved an auROC of 0.73 [0.69–0.76] and 0.66 [0.62–0.70], respectively, inferior to the results achieved by the four blood test model and by the anthropometry models.

Conclusions:

The four blood test and anthropometric models outperformed the commonly used nonlaboratory models, the FINDRISC and the GDRS. We suggest that our models be used as tools for decision-makers to assess populations at elevated T2D risk and thus improve medical strategies. These models might also provide a personal catalyst for changing lifestyle, diet, or medication modifications to lower the risk of T2D onset.

Funding:

The funders had no role in study design, data collection, interpretation, or the decision to submit the work for publication.

Editor's evaluation

The authors have used the UK Biobank with sophisticated statistical modeling to predict the risk of type 2 diabetes mellitus development. Prognosis and early detection of diabetes are key factors in clinical practice, and the current data suggest a new machine-learning-based algorithm that further advances our ability to prevent diabetes.

//doi.org/10.7554/eLife.71862.sa0

  • Decision letter
  • Reviews on Sciety
  • eLife's review process

Introduction

Diabetes mellitus is a group of diseases characterized by symptoms of chronic hyperglycemia and is becoming one of the world’s most challenging epidemics. The prevalence of type 2 diabetes [T2D] has increased from 4.7% in 1980 to 10% in 2021, and is considered the cause of an estimated 6.7 million deaths in 2021 [International Diabetes Federation - Type 2 diabetes, 2022]. T2D is characterized by insulin resistance, resulting in hyperglycemia, and accounts for ~90% of all diabetes cases [Zimmet et al., 2016].

In recent years, the prevalence of diabetes has been rising more rapidly in low- and middle-income countries [LMICs] than in high-income countries [Diabetes programme, WHO, 2021]. In 2019, Eberhard et al. estimated that every other person with diabetes in the world is undiagnosed [Standl et al., 2019]. 83.8% of all cases of undiagnosed diabetes are in low-mid-income countries [Beagley et al., 2014], and according to the IDF Diabetes Atlas, over 75% of adults with diabetes live in low- to middle-income countries [IDF Diabetes Atlas, 2022], where laboratory diagnostic testing is limited [Wilson et al., 2018].

According to several studies, a healthy diet, regular physical activity, maintaining normal body weight, and avoiding tobacco use can prevent or delay T2D onset [Home, 2022; Diabetes programme, WHO, 2021; Knowler et al., 2002; Lindström et al., 2006; Diabetes Prevention Program Research Group, 2015]. A screening tool that can identify individuals at risk will enable a lifestyle or medication intervention. Ideally, such a screening tool should be accurate, simple, and low-cost. It should also be easily available, making it accessible for populations having difficulties using the computer.

Several such tools are in use today [Noble et al., 2011; Collins et al., 2011; Kengne et al., 2014]. The Finnish Diabetes Risk Score [FINDRISC], a commonly used, noninvasive T2D risk-score model, estimates the risk of patients between the ages of 35 and 64 of developing T2D within 10 years. The FINDRISC was created based on a prospective cohort of 4746 and 4615 individuals in Finland in 1987 and 1992, respectively. The FINDRISC model employs gender, age, body mass index [BMI], blood pressure medications, a history of high blood glucose, physical activity, daily consumption of fruits, berries, or vegetables, and family history of diabetes as the parameters for the model. The FINDRISC can be used as a scorecard model or a logistic regression [LR] model [Bernabe-Ortiz et al., 2018; Lindström and Tuomilehto, 2003; Meijnikman et al., 2018].

Another commonly used scorecard prediction model is the German Diabetes Risk Score [GDRS], which estimates the 5-year risk of developing T2D. The GDRS is based on 9729 men and 15,438 women between the ages of 35–65 from the European Prospective Investigation into Cancer and Nutrition [EPIC]-Potsdam study [EPIC Centres - GERMANY, 2022]. The GDRS is a Cox regression model using age, height, waist circumference, the prevalence of hypertension [yes/no], smoking behavior, physical activity, moderate alcohol consumption, coffee consumption, intake of whole-grain bread, intake of red meat, and parent and sibling history of T2D [Schulze et al., 2007; Mühlenbruch et al., 2014].

Barbara Di Camillo et al. reported in 2019 the development of three survival analysis models using the following features: background and anthropometric information, routine laboratory tests, and results from an Oral Glucose Challenge Test [OGTT]. The cohorts consisted of 8483 people from three large Finnish and Spanish datasets. They report achieving area under the receiver operating curve [auROC] scores equal to 0.83, 0.87, and 0.90, outperforming the FINDRISC and Framingham scores [Di Camillo et al., 2018]. In 2021, Lara Lama et al. reported using a random forest classifier on 7949 participants from the greater Stockholm area to investigate the key features for predicting prediabetes and T2D onset. They found that BMI, waist–hip ratio [WHR], age, systolic and diastolic blood pressure, and a family history of diabetes were the most significant predictive features for T2D and prediabetes [Lama et al., 2021].

The goal of the present research is to develop easy-to-use, clinically usable models that are highly predictive of T2D onset. We developed two simple scorecard models and compared their predictive power to the established FINDRISC and GDRS models. We trained both models using a subset of data from the UK Biobank [UKB] observational study cohort and reported the results using holdout data from the same study. We based one of the models on easily accessible anthropometric measures and the other on four common blood tests. Since we trained and evaluated our models using the UKB database, the models are therefore most relevant for the UK population aged 40–65 or for populations with similar characteristics [as presented in Table 1]. As an external test case for the four blood test model, we used the Israeli electronic medical record database of Clalit Health Services [Artzi et al., 2020].

Cohort statistical data.

Characteristics of this study’s cohort population and the UK Biobank [UKB] population. A ‘±’ sign denotes the standard deviation. While type 2 diabetes [T2D] prevalence in the UKB participants is 4.8%, it is 1.79% in our cohort as we screened the cohort at baseline for HbA1c% levels

Chủ Đề