Interpretation of a Quantitative Diagnosis Model of Traditional Chinese Medicine Syndromes Based on Computer Adaptive Testing

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Associated Data

Supplementary Materials: Figure 1. The receiver operating characteristic (ROC) curve of every syndrome. Clinicians' syndrome differentiation results were divided into syndrome element forms, according to the theory of syndromes, and used as state variables. The CAT model diagnosis results were used as test variables. Table 1. ROC curve analysis results of CAT model at baseline. Table 2. ROC curve analysis results of CAT model at follow-up. According to the results, the area under the curve (AUC) for the CAT model was of >0.8 both at baseline and follow-up, which are indicative of high model accuracy.

GUID: FB186B4B-2EA0-4977-B483-213FD20CDC7A

The data used to support the findings of this study are available from the corresponding author upon request. Requests for data, (6/12 months) after publication of this article, will be considered by the corresponding author.

Abstract

Objectives

The aim of this study is to interpret a quantitative diagnosis model of traditional Chinese medicine (TCM) syndromes based on computer adaptive testing (CAT), from the perspective of both patients and clinicians.

Methods

In this cross-sectional study, patients with postprandial distress syndrome completed the CAT model of TCM syndromes and the Chinese version of the Quality of Life Questionnaire for Functional Digestive Disorders (Chin-FDDQL); the clinicians' diagnosis was concurrently recorded. The patients completed this questionnaire again after 14 ± 2 days. The kappa test and paired chi-square test were used to evaluate the consistency between the CAT model and clinical diagnosis. Minimal clinically important differences (MCID) of the Chin-FDDQL scores were used to assess clinical efficacy from the patients' perspective. Logistic regression was used to examine the association between changes in the CAT model syndrome domain scores and changes in clinical outcomes.

Results

Conclusions

This study proposes a method for the clinical interpretation of the CAT model of TCM syndromes, including evidence derived from the application. It may provide a reference for future interpretation of other CAT models.

1. Introduction

Accurate syndrome diagnosis is the foundation of effective management and treatment [1]. However, traditional approaches to syndrome diagnosis rely on clinicians' experience; at present, there is a lack of objective traditional Chinese medicine (TCM) syndrome detection protocols [2]. Therefore, the use of statistical models and artificial intelligence, among other methods, has increased in recent years, aiming to make TCM syndrome differentiation objective [3, 4].

We have previously introduced syndrome elements based on the theory of TCM and established a quantitative diagnosis model of TCM syndromes in functional gastrointestinal diseases, specifically, functional dyspepsia and irritable bowel syndrome, using the traditional statistical theory and modern advanced measurement theory [2, 5]. The model allows patients to input their symptoms and obtain scores per syndrome domain, helping to quantify the syndrome; subsequently, the model has been combined with computer technology. The computer adaptive test (CAT) model [6] streamlines the process of patients inputting their symptoms and maintains accuracy in syndrome differentiation. It is a novel and feasible tool for the quantification of TCM syndromes.

However, the clinical interpretation of the model has not been examined to-date. To our knowledge, a clinical interpretation of the quantitative CAT model of TCM syndromes has not been established for any specific disease. In fact, the accuracy of syndrome differentiation is often based on a clinician's judgment, which is unsatisfying. Patients are the main recipient of syndrome differentiation. The TCM theory stipulates that changes to any of the syndrome domains may change patients' symptoms and outcomes. Therefore, we examined the changes in patients' symptoms and clinical outcomes to assess whether the CAT model-based syndrome differentiation is accurate, helping in the quantification and objective assessment of TCM syndromes.

In this study, we used a postprandial distress syndrome (PDS) model. PDS is among the most common functional gastrointestinal diseases observed in clinical practice, and its incidence is increasing. PDS is not life-threatening; however, it is associated with a long disease course and recurring symptoms, which may affect the patients' quality of life. PDS is also associated with a high economic burden to patients and healthcare systems [7]. It can be divided into two subtypes: PDS and epigastric pain syndrome (EPS). Impaired gastric accommodation is more prevalent in PDS than in EPS [8]. PDS belongs to the TCM category of gastric stuffiness and is among the most common diseases in the clinic. TCM has been reported as an effective complementary and alternative approach in the treatment of PDS [9, 10].

There are no laboratory indicators that evaluate clinical outcomes of PDS. TCM tends to account for patients' subjective symptoms; therefore, we converted patients' symptoms into numeric values, using the Chinese version of the Quality of Life Questionnaire for Functional Digestive Disorders (Chin-FDDQL) [11], which is a commonly used patient outcome reporting scale; minimal clinically important differences (MCID) were calculated to estimate any relationship between the changes in the Chin-FDDQL scores and clinically meaningful outcomes for patients. The MCID may help make symptom and outcome reporting more objective [12]. It is commonly used in the clinical interpretation of patient-reported outcomes and can be calculated using anchor- and distribution-based methods [13–16].

2. Methods

This cross-sectional study included patients who attended the outpatient clinic at the study site. This work was approved by the Clinical Research and Ethics Committee at the First Affiliated Hospital of the Guangzhou University of Chinese Medicine (NO. K (2019) 074), and all patients were diagnosed by senior clinicians referring to the Roman IV classification criteria for PDS.

Patients were eligible for the present study if they met the following criteria: aged ≥16 years, met the Rome IV PDS criteria, and agreed to study participation. Patients were excluded from the present study if they had other digestive system diseases, cognitive or other impairments (including mental illness and visual impairment, among others) that affected their ability to complete self-reports, or diagnoses of cardiovascular or cerebrovascular diseases, renal insufficiency, hematopoietic system, or another serious primary disease; pregnant women were also excluded from the present study. Further, data from patients that met the following criteria were considered “invalid” and were excluded from analysis: misdiagnosis, another diagnosis, or a major accident experienced during the study period, loss to follow-up, or missing ≥20% of data.

2.1. Data Collection

The CAT model is an adaptive quantitative evaluation system (patent no.: 2017 sr559575) for TCM syndromes of FGIDs, covering three TCM diseases: stomachache, gastric stuffiness, and diarrhea. It integrates the TCM syndrome differentiation diagnosis tree, artificial intelligence, computer engineering, and multivariate statistical models that account for syndrome domains and other aspects of the TCM theory. Development, simulation, and verification of the CAT model have been previously described [6, 17–19]. In this study, we selected a common PDS disease, which belongs to the gastric stuffiness category of TCM, to explore the CAT model clinical interpretation methods.

The gastric stuffiness CAT model had 39 items extracted from a bank of 215 items. It used the maximum determinant value of the information matrix to select the next test item; in addition, the maximum a posteriori capability level assessment estimates were used. There were 20 answers available as the test termination condition. We asked patients to input data on their symptoms and experiences into the CAT evaluation system. Finally, the patients' scores per syndrome domain were displayed in the form of a radar chart.

The Chin-FDDQL [11] was translated by our team from the original version, designed to measure the pathology and symptom scores of FD and irritable bowel syndrome across eight domains (daily activity, anxiety, diet, sleep, discomfort, health perceptions, stress levels, and total scores) and 43 items [20]. It is a useful health assessment instrument for Chinese patients with FD; it is associated with good reliability, validity, responsibility, item test function, differential item functioning characteristics, and interpretation systems [19, 20].

Outcome assessment and follow-up protocols were as follows. First, the investigators presented the study aims to eligible patients; subsequently, the patients completed the Chin-FDDQL, using the Wen Juan Xing application, and the CAT model system; the questionnaires were completed again after 14 ± 2 days. The clinicians' diagnoses were recorded at the same time; for patients unable to attend follow-up assessments on schedule, we provided a link to the electronic version of the scale via WeChat or we collected their answers via phone interviews, subsequently requesting that the participating clinicians make a diagnosis based on the patient's statement.

2.2. Statistical Methods

The CAT model and Chin-FDDQL data were exported to and sorted in Excel. To standardize the evaluation of syndrome domain, the CAT model scores were transformed, according to the distribution characteristics of the full-sample computer adaptive test scores. The conversion formula was as follows:

Score standard = Score CAT − Score min Score max − Score min × 100.

The clinicians' syndrome differentiation results were divided into syndrome element forms, according to the theory of syndromes, and used as state variables. The CAT model diagnosis results were used as test variables to draw the receiver operating characteristic (ROC) curve for every syndrome element. The area under the curve (AUC) was used to verify the accuracy of model diagnosis; AUC values of >0.8 were considered indicative of high model accuracy. The Youden Index was used as a reference parameter; when the Youden Index reached its maximum value, the score corresponding to the cut-off point was regarded as the diagnostic threshold of an element.

We examined the CAT model diagnosis from the physician's perspective, according to the diagnostic threshold of every syndrome domain. We then used the kappa test and paired chi-square test to analyze the consistency between the CAT model and expert diagnoses. Kappa values of ≥0.75 indicated excellent consistency; those 0.40–0.75 and

To account for the patients' perspective, we used the paired sample t-test or Wilcoxon signed-rank test to measure the responsiveness of the Chin-FDDQL scores to time-dependent changes. We then calculated the associated MCID. To reduce bias associated with using a single method, we obtained averages of the MCID values by anchor-based and distribution-based methods; these values were used as final estimates. Anchor-based methods rely on an external measure of change as the standard, and distribution-based methods are based on a statistical measure of variability.

Because PDS has no objective index for clinical efficacy evaluation, we chose the most applied patient self-assessment method, adding an item as an anchor at the end of the Chin-FDDQL. This item was captured during the follow-up period to determine the MCID [21]. This item was “how do you feel now compared with last time?,” with the following response options: obviously worse, somewhat worse, no change, somewhat better, and obviously better; the corresponding scores were set to −2, −1, 0, 1, and 2 points, respectively. We identified patients who reported having experienced a change and then calculated the difference between their baseline and follow-up Chin-FDDQL scores (total and domain-specific). If the score difference values obeyed the normal or skewed distribution, the mean or median of the difference was used as the MCID value, respectively.

This study used the common effect size (ES) estimating methods [21]; MCID was estimated by multiplying the baseline standard deviation value of the Chin-FDDQL scores by the ES. Some studies in China have proposed an ES value of 0.5 [22], while recommending ES values of 0.2 for the evaluation of the MCID in the Western context [23]. Therefore, we used both methods to estimate the MCID; we combined these estimates with the expert opinion to obtain the MCID that reflected clinical practice.

To explore the clinical value of the CAT model, we compared changes (d) to the Chin-FDDQL total and domain scores with the corresponding MCID; d ≥ MCID represented clinical benefits from the patients' perspective. We then classified patient outcomes into “change” and “no change” groups. Finally, we performed logistic regression analysis to explore the association between syndrome element score changes in the CAT model (independent variable) and clinical outcomes (dependent variable) (1 = clinically significant change, 0 = no clinically significant change). Figure 1 presents a schematic of the approach to the CAT model exploration.

An external file that holds a picture, illustration, etc. Object name is ECAM2022-3203158.001.jpg

Schematic diagram of exploring the clinical explanation of the CAT model.