Introduction

Immunoglobulin A nephropathy (IgAN) is currently the most common primary glomerular disease worldwide and accounts for more than 40% of primary glomerular diseases in China1,2. IgAN has diverse clinical manifestations and complicated pathogenesis. The most common clinical presentations are asymptomatic hematuria, proteinuria, and renal insufficiency. Epidemiological studies have found that patients with IgAN can progress to end-stage renal disease (ESRD) within 10–20 years after the first detection3. The diagnosis and treatment of IgAN have become a prominent problem in the field of nephrology, and early diagnosis has great clinical significance for delaying the progression of the disease.

Renal biopsy is the gold standard for diagnosing IgAN. However, it is an invasive operation involving complications such as bleeding and infection4. Due to the lack of advanced equipment in basic hospitals, this procedure is difficult to perform in some less-developed areas, making the best treatment inaccessible to many patients. There is an urgent need for a simple and noninvasive diagnostic model for IgAN. In the era of big data, computational models known as artificial neural networks (ANNs) have nonlinear functions, which can deal with the complex relationships between input and output. The ANN model can analyze the interactions among medical risk factors more clearly than traditional statistical approaches by learning from examples5,6. It has been widely used to predict, diagnose, and classify various diseases in the life sciences7,8,9. The backpropagation ANN (BP-ANN) algorithm, one of the most popular neural network models, is a multilayer feed-forward network that depends on the error backpropagation algorithm10. Presently, the risk prediction of IgAN prognosis has been widely studied both locally and abroad11,12. However, studies concerning the early risk prediction model of IgAN are still limited. As a result, noninvasive predictive models using BP-ANN and traditional logistic regression are expected to be constructed and compared.

The present study aimed to retrospectively analyze the biological parameters closely related to the presence of IgAN and to screen out optimized biological parameters. The noninvasive anticipated model can not only be used for early warning of IgAN, but also help improve disease prognosis.

Methods

Study subjects

Patients first diagnosed with primary glomerular diseases via renal biopsy at the First Hospital of Jilin University between November 1, 2010, and November 1, 2020, were selected as the research subjects and were divided into two groups: the IgAN group and the non-IgAN group (those diagnosed with other primary glomerular diseases). The inclusion criteria were as follows: (1) patients who consented to undergo a renal biopsy during their hospital admission, (2) those who had complete clinical information, and (3) those aged above 18 years. The exclusion criteria were as follows: (1) patients who received corticosteroids or immunosuppression treatment before their condition was diagnosed; (2) those with a history secondary kidney disease; (3) those aged under 18 years; (4) pregnant women; and (5) patients diagnosed with an infection, tumor, or autoimmune disease. Based on the above criteria, 730 cases were finally enrolled (212 patients with IgAN and 518 patients without IgAN), and they were randomly divided into a training cohort (n = 511) and a validation cohort (n = 219) in a 7:3 ratio. The training cohort was used to build the prediction models and the validation cohort was used to test the prediction effect of the models.

This study was approved by the ethics committee of the First Hospital of Jilin University, Changchun, China (2021–036).

Clinical parameters

Clinical information such as age, sex, blood pressure, laboratory tests (including blood biochemistry, immunology, and urinalysis) before renal biopsy, and renal pathological data were collected. Then, the serum IgA/C3 ratio was calculated for each patient.

Algorithm of BP-ANN

We explored the relationship between risk factors and IgAN using the BP-ANN model. The BP-ANN was composed of three layers: the input layer, hidden layer, and output layer. The input layer of the ANN consisted of the variables showing statistical significance in the logistic regression analysis. The output layer referred to one neuron representing the presence of IgAN (valued as end = 1 for IgAN, and end = 0 for non-IgAN). The entire group was divided into a training group (70%) and a validation group (30%) using a random number generator. Back propagation of the error was used to dynamically adjust the network weights until the error was satisfied.

Statistical analysis

Statistical analysis was performed using SPSS version 19.0. Normally distributed data were expressed as x ± s (mean ± SD) and compared using the unpaired Student’s t-test. The non-normally distributed data were expressed as medians with their corresponding interquartile ranges and compared using the Mann–Whitney U-test. Categorical variables were expressed as proportions (percentages) and compared using Chi-square tests. A value of P < 0.05 was considered to indicate a statistical difference. Statistically significant indicators from the univariate analysis were used as independent variables in the logistic regression model. Receiver operating characteristic (ROC) curves were then plotted, and the area under the curve (AUC) was calculated. The ANN models were developed using MATLAB 7.4.0. The predictive level of the model was evaluated based on the AUC, sensitivity, and specificity values.

Ethics approval

This study was approved by the ethics committee of the First Hospital of Jilin University, Changchun, China (2021-036).

Consent to participate

Written informed consent was provided from all participants.

Consent for publication

Consent for publication can be obtained from participants.

Statement of methods

All methods were carried out in accordance with relevant guidelines and regulations.

Results

Clinical characteristics

A total of 730 patients with a primary glomerular diseases were enrolled in this study. The pathological types of the non-IgAN cases included membranous nephropathy, mesangial capillary glomerulonephritis, focal segmental glomerulosclerosis, and minimal change disease. The flow diagram of subjects screening and grouping is shown Fig. 1. The training cohort consisted of 511 patients (310 with IgAN and 201 with non-IgAN), of which 45.5% were male, with an average age of 39 years (range, 29.0–51.8 years). The validation set consisted of 219 patients (127 with IgAN and 92 with non-IgAN), of which 47% were male, with an average age of 40 years (range, 30.0–52.0 years). As shown in Table 1, there were no statistical differences in any clinical characteristics between the training and validation cohorts.

Figure 1
figure 1

The flow diagram of subjects screening and grouping.

Table 1 Clinical characteristics of the training and validation groups.

As presented in Table 2, compared with the non-IgAN group patients in the training cohort, the IgAN group patients were significantly younger on average, had a higher incidence of hematuria and hypertension, and had higher levels of serum albumin, urea nitrogen, creatinine, uric acid, IgA, IgG, and IgA/C3 ratios (P < 0.01). The IgAN group patients also had significantly lower levels of serological IgM, complement C3, total cholesterol, triglycerides, high-density lipoprotein (HDL) cholesterol, low-density lipoprotein(LDL) cholesterol, and hemoglobin, as well as a significantly lower estimated glomerular filtration rate (eGFR) and 24-h urine protein (P < 0.01).

Table 2 Differences in the clinical parameters between IgAN and non-IgAN in the training set.

Univariate and multivariate logistic regression

Univariate analysis of the training cohort revealed that the following 19 items were significantly related to IgAN (all P < 0.05): age, hypertension, hematuria, eGFR, 24-h urine protein and serum albumin, urea nitrogen, creatinine, uric acid, IgA, IgG, IgM, complement C3, IgA/C3 ratio, total cholesterol, triglycerides, HDL cholesterol, LDL cholesterol, and hemoglobin. After considering the covariate collinearity among the aforementioned factors, urea nitrogen, creatinine, and hemoglobin were excluded. The other 16 significant variables were included in the multivariable logistic analysis. Our results showed that age, serum IgA/C3 ratio, serum albumin, serum IgA, serum IgG, eGFR, and hematuria were independent risk indicators of the occurrence of IgAN (Table 3). The abovementioned seven factors were selected as parameters to establish a regression equation for the diagnosis of IgAN, expressed as P = exp(− 0.808 − 0.051 × age + 0.099 × albumin + 0.766 × hematuria + 0.349 × IgA + 0.134 × IgG + 0.709 × IgA/C3 ratio − 0.028 × eGFR)/[1 + exp(0.808 − 0.051 × age + 0.099 × albumin + 0.766 × hematuria + 0.349 × IgA + 0.134 × IgG + 0.709 × IgA/C3 ratio − 0.028 × eGFR)]. The ROC curve was plotted, and the AUC, sensitivity, and specificity were estimated to be 0.92, 84.1%, and 91.4%, respectively (Fig. 2A). When applied to the test dataset, the logistic regression model showed an AUC of 0.839, a sensitivity of 81.9%, and a specificity of 83.7% (Fig. 2B).

Table 3 Multivariate logistic regression analysis for IgAN.
Figure 2
figure 2

ROC curve of logistic regression modeling for predicting IgAN. (A) Area under the ROC curves were 0.92 in training set. (B) Area under the ROC curves were 0.839 in validation set.

BP-ANN model prediction of IgAN

A BP-ANN model was constructed using the training data. Based on the multivariable logistic regression results, seven significant factors were chosen as independent variables. The structure of BP-ANN model and network training process were shown in Fig. 3. The ROC curve was then obtained (Fig. 4A), and the BP-ANN model was found to provide a good predictive performance, with an AUC, sensitivity, and specificity of 0.965, 84.78%, and 94.53%, respectively. The predictive efficacy of the model was further evaluated using the validation set. In the validation cohort, the AUC, sensitivity, and specificity of the model were 0.881, 82.68%, and 84.78%, respectively (Fig. 4B).

Figure 3
figure 3

The structure of the artificial neural networks model and BP-ANN training process.

Figure 4
figure 4

ROC curve of BP-ANN for predicting IgAN. (A) Area under the ROC curves were 0.965 in training set. (B) Area under the ROC curves were 0.881 in validation set.

Comparison of the BP-ANN and logistic regression models

The evaluation indexes of the BP-ANN and the logistic regression models were compared. AUC values were obtained from the logistic regression and BP-ANN models using the validation set for IgAN prediction. The AUC value of the BP-ANN model was 0.881, which was higher than that of the logistic regression model, indicating the superior performance of the constructed neural network in IgAN prediction.

Discussion

The clinical manifestations of IgAN vary from asymptomatic hematuria or proteinuria in the early stages to rapid-onset ESRD in the late stages. IgAN is generally immune-mediated by increased aberrantly glycosylated IgA1 and subsequent complement C3 deposits in the glomerular mesangium13. Although renal biopsy is the gold standard for diagnosing IgAN, its clinical application is limited in less-developed areas in China and by some patients’ insufficient awareness of its necessity. Galactose-deficient IgA1, a peptide mass fingerprint, has been reported as new specific indicators for the diagnosis of IgAN14. However, their detection costs are expensive, and technology requirements of the operation are so high that they are difficult to apply in clinical practice. Therefore, exploring the clinical and laboratory indicators related to IgAN and constructing noninvasive prediction models to screen patients with high risk are of great significance. Through a retrospective cohort study, we identified the risk factors related to IgAN, built the diagnostic models, and evaluated the predictive ability of different modeling algorithms.

In this study, compared with non-IgAN patients, IgAN was found to usually occur in young and middle-aged people, who were more likely to have hematuria, proteinuria, and hypertension. Most patients had elevated serum immunoglobulins (especially IgA), decreased complement C3, and renal injury, which are well-known features of IgAN15. Most scholars now agree that the serum IgA/C3 ratio is more valuable than serum IgA and C3 for the diagnosis and monitoring of IgAN16,17,18.Therefore, the serum IgA/C3 ratio was included in this study. In addition, serum IgG levels in the IgAN group were significantly higher than those in the non-IgAN group, which is consistent with the findings of previous literature19. In the training dataset, 70% of patients in the non-IgAN group had nephrotic syndrome, while only 22% of patients in the IgAN group had this syndrome. Because of the low proportion of nephrotic syndrome patients with IgAN, it is speculated that indicators related to nephrotic syndrome may be helpful for the differential diagnosis of IgAN and non-IgAN20, which also explains the high lipid and proteinuria levels and low serum albumin levels in the non-IgAN group. Logistic regression analysis was used to control for confounding factors, and seven variables, such as age, serum IgA/C3 ratio, serum albumin, serum IgA, serum IgG, eGFR, and the presence of hematuria, were found to be independent predictors of IgAN. Of these, the finding that serum IgA/C3 ratio can help diagnose IgAN is in line with previous related studies21,22. Originally, Maeda reported that the serum IgA/C3 ratio, combined with microscopic hematuria and/or proteinuria and high serum IgA levels, can be used to distinguish IgAN from other primary renal diseases23. In 2012, Gao’s team used logistic regression analysis for the differential diagnosis of IgAN and non-IgAN, by incorporating three factors: serum IgA, fibrinogen, and clinical presentation with an AUC of 0.83824. However, the sample size of their study was small, and therefore, the reliability of the results obtained need further verification. Later, Han QX incorporated age, serum IgA, total cholesterol, D-dimer, and fibrinogen into a logistic regression model for the noninvasive differential diagnosis of IgAN. However, the model was not validated, and therefore, its accuracy remains unverified25. In contrast, our study enrolled a large number of patients for the combined diagnosis of IgAN through the multiple predictors mentioned above, with an AUC of 0.92, sensitivity of 84.1%, and specificity of 91.4%. We tested the model on the validation set and obtained AUC, sensitivity, and specificity of 0.839 (more than 0.7), 81.9%, and 83.7% (more than 70%), respectively. Our results indicated that the multifactor-based logistic regression model can effectively predict the risk of IgAN.

However, logistic regression models cannot handle complex nonlinear relationships between inputs and outputs, nor can they detect all possible interactions between predictors. A logistic regression model can only work if the states of all the variables are known, which is often difficult to achieve in clinical practice. In contrast, ANNs have strong nonlinear mapping capability and can handle the complex intrinsic relationships between the missing data and variables. Furthermore, ANN models have been successfully used for prediction and classification in different areas, including informatics and medicine26,27,28. In this study, an ANN model for the early screening of IgAN was constructed and validated for the first time based on routine and serum markers. A ROC curve was used to assess the efficacy of the model in predicting the risk of IgAN. The AUC of the validation cohort was very similar to that of the training set, and both were significantly higher than those of the logistic regression model, indicating that the ANN model has better diagnostic performance in differentiating IgAN from non-IgAN. The results showed that the ANN model is more suitable for predicting the risk of IgAN than non-IgAN. Thus, it can be concluded that the ANN model has better clinical usability, as an auxiliary tool for early discovery and timely treatment.

This study developed and validated a predictive model for screening the high risk of IgAN with the following advantages: (1) all patients enrolled had primary glomerular diseases confirmed by renal biopsy; (2) combining serum IgA/C3 ratio with age, serum albumin, total cholesterol, and hematuria to establish a predictive model reduced the limitations of using only the serum IgA/C3 ratio as the differential indicator; (3) with the same modeling variables, a simple, safe, and accurate predictive model for IgAN was developed that has good prospects for clinical application.

However, we have to point out some limitations: (1) there could have been selection bias and information bias owing to the retrospective nature of study design; (2) the small size of the cohort could have influenced the model performance to some degree. Our research objective would be better addressed using a larger validation cohort in a multicenter study; and (3) the model can not determine the grade of IgAN. which has a certain impact on the diagnosis and treatment. (4) The model has not been validated in an external independent cohort.

In conclusion, the established multifactor diagnostic model could effectively distinguish IgAN patients from non-IgAN patients with good specificity. The ANN noninvasive diagnostic model can predict IgAN better than logistic regression and may have good clinical applicability. This model can be helpful for early detection of high-risk IgAN patients especially in less-developed regions.