IASP taxonomy of chronic pain syndromes: preliminary assessment of reliability ¦ ⸢23.04⸥

177




177

Pain, 30 (1987) 177-189 Elsevier PA1 01074 IASP taxonomy of chronic pain syndromes: preliminary assessment of reliability Dennis C. Turk and Thomas E. Rudy * Department of Psychiatv and * Department of Anesthesiology, Center for Pain Evaluation and Treatment, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213 (U.S.A.) (Received 21 October 1986, revised received 5 January 1987, accepted 8 January 1987) Communication and consequently advancement of knowledge in understanding and Summary treatment of chronic pain has been hindered by the absence of a taxonomy of chronic pain syndromes. Recently the IASP Subcommittee on Taxonomy proposed a classification method based on a multiaxial system. In the present study the interjudge reliability of 2 of the 5 axes, body location and presumed etiology are evaluated. Overall, axis I demonstrated good reliability, however, the reliability of several categories contained within this axis were low enough to suggest minor changes to this axis may increase its clinical utility. Axis V was found to have only fair reliability and many of the categories comprising this axis were demonstrated to have reliabilities that are not clinically acceptable. The implications of these results for future development and refinement of the IASP taxonomy are discussed. Key words: Chronic pain syndrome: Taxonomy Introduction Chronic pain is a complex phenomenon that has interested investigators from a diversity of backgrounds and countries for many years. A major factor inhibiting the advancement of knowledge of chronic pain and consequently its treatment has been the absence of any agreed upon and biometrically confirmed classification of pain syndromes that can be employed on a systematic basis [3,13]. The unavailability of any common system of classification has resulted in great confusion and an inability of investigators, not only across disciplines but within disciplines as well, to compare observations and results of research across studies. Bonica [3] referred to the state of affairs of chronic pain as ‘ the tower of Babel.’ The IASP heeded the concern of Bonica and established a Subcommittee on Taxonomy Correspondence to: Dr. Dennis C. Turk, Center for Pain Evaluation and Treatment, Pittsburgh School of Medicine, 230 Lothrop St., Pittsburgh, PA 15213, U.S.A. 0304-3959/87/$03.50 0 1987 Elsevier Science Publishers B.V. (Biomedical Division) University of

177

178

whose primary charge was to establish a working system of classification for chronic pain syndromes. Recently, the IASP Subcommittee on Taxonomy has published a Classification of Chronic Pain [ll] that is designed to serve as a preliminary attempt to standardize descriptions of relevant pain syndromes and as a point of reference. The proposed classification system is multiaxial and consists of 5 discrete axes believed by the subcommittee to be relevant to the diagnosis of chronic pain states. The 5 axes include: (a) axis I, the body region or site affected by pain; (b) axis II, the body system whose abnormal functioning produces the pain; (c) axis III, the temporal characteristics of pain; (d) axis lV, the patient’s statement of pain intensity and time of onset; and (e) axis V. the presumed etiology of the pain problem. Each patient is assigned a figure, ranging from 0 to 8 or 9 depending on the axis, corresponding to a specific category on each axis with the final classification being a 5-digit numerical code that reflects the clinician’s ratings on each axis. For example. it is suggested that carpal tunnel syndrome be coded 204.X6. In the classification of carpal tunnel syndrome. the first 2 in the code indicates that the body region is the upper shoulder and upper limbs (axis I), the second digit. 0, represents axis II and indicates the relevant system is the nervous system, the third digit, 4 in the code, indicates that the patient reported that his or her pain recurs irregularly (axis III), the X following the decimal point indicates that the intensity and time of onset will vary with each patient (axis IV), and the last digit, 6, indicates that the presumed etiology (axis V) is degenerative or mechanical. Demonstration that the taxonomy meets the minimum biometric criteria of reliability and validity will facilitate comparison across patients or across syndrome groups. The reliability of combining ratings on multiple scales to form a diagnostic impression will often be less than the reliability of the individual scales used to form a composite score [8]. For example, if one scale that has good reliability (e.g., 0.70) is used in combination with another scale that has poor reliability (e.g.. 0.30). the classifications that would result from combining these two will only have a modest overall reliability (i.e., (0.7 + 0.3)/2 = 0.5). As a result, combining these scales to reach a diagnosis would be of limited clinical utility due to the sizable amount of measurement error inherent in the composite scale. The diagnostic sensitivity (i.e., the ability to correctly detect a specific abnormal condition) of the classification would be very low. In other words, the sensitivity and specificity of this classification system depend on the reliability of the components that are used to reach a diagnostic classification. Only after each axis is demonstrated to be acceptably reliable is the combining of axes into pain categories or diagnoses warranted. In sum, research needs to be conducted to establish the reliability, validity, and utility of this taxometric system and, when indicated by research findings, suggestions are to be made for the refinement of this classification system [ll]. In this study we chose to examine the interrater agreement on 3 of the proposed IASP axes, axis I, body region or location of pain, axis II, the body system believed to be involved in pain, and axis V, the presumed etiology. Axis I includes a list of 9 regions (i.e., head, face, and mouth; cervical: upper shoulders and upper limbs; thoracic; abdominal; lower back, lumbar spine, sacrum, and coccyx; lower limbs;

178

179

pelvic; anal, perineal, and genital) and a tenth, optional category (‘more than 3 major sites’). Axis II includes 7 body systems (e.g., musculoskeletal and connective tissue, respiratory and cardiovascular), an ‘other organs or viscera’ category, and a ‘more than 1 system’ category. Axis V includes 10 categories, namely, genetic or congenital; trauma; infective/parasitic; inflammatory; neoplasm; toxic/metabolic; degenerative/mechanical; dysfunctional including psychophysiological; unknown/ other; and psychological. We specifically attempted to examine axes I, II, and V as they are based on clinical judgment and not on patients’ reports, memories, or interpretations (i.e., axis III, temporal characteristics and pattern of occurrence, and axis IV, patients’ statements of intensity and time since onset). As we will describe, establishing the reliability or agreement rates for axes I, II and V requires considerably different biometric procedures than assessing the reliability and biases of patient self-report data (axes III and IV). In sum, this study has 2 primary purposes: (1) to provide a preliminary assessment of the reliability of 3 of the 5 axes proposed by IASP as components of the diagnosis of chronic pain syndromes, and (2) to describe an appropriate strategy for assessing the reliability of classifying clinical events. Methods used to establish interrater reliability A scale that uses nominal measures refers to a categorical or qualitative scale where numbers are used as a shorthand technique to indicate that an observation belongs to one of a number of mutually exclusive classes. Four of the 5 proposed IASP axes represent nominal measurement scales (axes I-III and V). Ordinal levels of measurement refer to the rank ordering of categories within a scale to indicate more or less or a certain characteristic (e.g., mild, moderate, severe). Axis IV of the proposed system represents an ordinal scale. Establishing the reliability of a clinical scale that is based on a nominal or ordinal level of measurement requires a unique statistical methodology. Assessment of the reliability of the axes that comprise the IASP classification of pain syndromes relates to whether independent raters use these scales in consistent ways. In other words, if 2 physicians examined the same patients, reliability of an axis would be indicated if they both checked the same category. Thus, if 2 physicians evaluated a patient who is subsequently diagnosed as having carpal tunnel syndrome, both should code 200 on axis I, body region is upper shoulders and upper limbs. If there is 100% agreement, there would be little doubt that the scale is highly reliable and could be used by physicians with a high degree of accuracy. If, however, the 2 raters disagree on the coding of some cases, the difficulty is the determination of the extent of this disagreement and the computation of a reliability index that reflects their level of agreement. All too often, ‘interrater reliability’ report by investigators (a) only reflects an association between raters rather than an accurate assessment of agreement between ratings (e.g., a correlation index, which can be very significant even when there is a large or complete disagreement between raters [l]), (b) fails to take chance agreement into account (overall percentage of agreement), and (c) reliability is reported

179

180

without an associated test of statistical significance. For example, reporting that judges agree on 60% of the cases provides no information as to whether this percentage agreement was significantly greater than would be expected by chance. After reviewing 16 frequently used methods for calculating inter-rater reliability, Berk [2] concluded that there are only 2 basic approaches that are statistically valid, the kappa statistic and the intraclass or generalizability coefficients. The kappa statistic [7], which has had considerable success in other areas of clinical medicine 1121, is particularly well suited for nominal and ordinal measurement, as is the case for the IASP axes, and corrects for the problems noted above. To measure the reliability based on the kappa index (K), which is scaled so that the index can range from - 1.0 to 1.0, a reliability coefficient range that is analogous to the range of correlation coefficients, we need to calculate the values p,,, the proportion of agreement observed, and p,, the proportion of agreement expected by chance. Specificaliy, the reliabi~ty coefficient K = (p, - pJ(1 - p,). To illustrate the use of the kappa statistic, we can consider the situation in which 2 physicians examined the spinal mobility of 100 low back pain patients and rated it as normal or restricted. Table I presents data that may result from these ratings. As can be seen in Table I, both raters indicated that spinal mobility was normal in 60 of the patients and restricted in 40 patients. To establish the observed proportion of agreement, p,, the sum of the main diagonal cells, which represent the number of agreements between raters, is divided by the total number of patients (N) (i.e., p, = (40 -t- 20)/100 = 0.60). Thus, these 2 raters are in agreement 60% of the time. However, a number of agreements can be expected by chance alone. The number of agreements that can be expected by chance is computed by multiplying the row and column marginal totals, dividing by N, which, as in the chi-square statistical approach, yields the expected frequencies, as displayed in Table I. The total percent agreement expected by chance can also be computed. In Table I, this chance expected value is pC = (36 + 16)/100 = 0.52, i.e., 52% of the time we would expect these 2 physicians to be in agreement simply by chance alone. Based on these calculations, we can conclude that these raters have done 8% (p,, - p,) better in rating spinal mobility than would be expected by chance. TABLE I EXAMPLE DATA Numbers Rater A OF A KAPPA in parentheses RELIABILITY represent frequency BASED ON CATEGORICAL by chance. Rater B Normal Normal expected ANALYSIS Restricted Total 40 20 60 Restricted (36) 20 (24) 20 40 Total (24) 60 (I61 40 100 DIAGNOSTIC

180

181

Regardless of the clinical utility of the judgments described, we can ask: is this 8% difference statistically si~fi~nt? In general, kappa is negative (p,, < p,) if inter-rater agreement is worse than chance, zero if p0 = pc, indicating only chance agreement, and positive if agreement is better than chance (p,, > p,). If K = 1.0, this would indicate that there is perfect agreement and the off diagonal cells, which represent disagreements between raters, in a table such as Table I would be zero. In the example in Table I, the kappa reliability coefficient is K = (0.60 - 0.52)/(1 0.52) = 0.167. When perfect agreement is not reached, statistical tests can be computed to assess whether a particular kappa value is sig~ficantly different from zero or if one kappa value is significantly different from another. A z ratio, which approximates the standard normal distribution, is formed by dividing K by its standard deviation. Computing the standard deviation for the data displayed in Table I indicated that this value was 0.102 and the z ratio was computed to be 1.64 (0.167/0.102). This z value can then be compared with the standard normal distribution and if z > 1.96, one rejects the hypothesis that K is equal to zero. In the example in Table I, this hypothesis cannot be rejected and we would conclude that these 2 raters did not significantly agree beyond what would be expected by chance. Thus, the dichotomized rating of spinal mobility in this example is not reliable. Consequently, any decisions that are based on this dichotomous rating of spinal mobility are likely to be erroneous, Although a reliability index that is statistically significant is a prerequisite to its clinical utility, statistical significance should not be confused with clinical or practical significance. A scale that displays statistical significance may not be clinically useful if it does not display an adequate level of chu~ce-corrected agreements. Guidelines for interpreting the clinical significance of the kappa coefficient have been proposed by several investigators [5,12]. These authors suggest that: a kappa value less than 0.40 should be considered ‘poor,’ values between 0.40 and 0.59 are ‘fair,’ 0.60-0.74 is ‘good,’ and 0.75 or over can be considered as ‘excellent.’ These criteria were adopted in the present study. In sum, (a) the amount of interrater agreement without correcting for chance agreements can be quite misleading, (b) reporting chance-corrected agreement rates without statistically testing whether this agreement rate is significantly different from zero is only marginally better than reporting percent agreement, and (c) a scale that displays statistically significant interrater agreement does not guarantee that it is of clinical utility. Method Patients The subjects who participated in the study were 115 consecutive chronic pain patients referred to either the Center for Pain Evaluation and Treatment at the University of Pittsburgh or the Pain Control and Rehabilitation Institute of Georgia. All patients were physician referrals who had pain that persisted past the

181

IX2

expected time of healing, with the mean duration of pain over 7 years. The patient sample was quite heterogeneous, representing the typical patients treated at comprehensive pain centers [ll]. All patients were given a comprehensive evaluation that is standard practice for both of the pain centers. Because only the reliability of the IASP taxonomy was the focus of this study. none of the other evaluation data will be discussed. Raters Four board certified physicians in at least one specialty (e.g., anesthesiology, internal medicine) participate in the study. Additionally, their average length of time spent in the evaluation and treatment of chronic pain patients was over 8 years. Three registered physical therapists with an average of 10 years’ experience also participated in this study. In order to test the reliability of the IASP taxometric system [ll] under a variety of clinical circumstances, 4 rater conditions with 2 raters within each condition were created. These 4 conditions were: (1) 2 physicians conducted chart reviews on 30 cases and independently rated each patient on axes I and V; (2) 2 physicians independently examined 35 patients and provided separate ratings on axes I and V; (3) 2 physical therapists independently evaluated 20 patients and provided separate ratings on axis I; and (4) a physical therapist and a physician independently examined the remaining 30 patients and provided separate ratings on axis I. All raters were blind to the other’s rating on the same patient. Thus, the reliability of coding a patient’s primary site of pain was evaluated for 115 cases using a combination of chart reviews vs. direct exa~nation of the patient, and physicians vs. physical therapists. The coding of axis V, the hypothesized etiology of the patient’s pain was completed only by physicians. Thus, 65 cases were available to determine the interrater reliability of axis V, under both chart review and direct examination conditions. The inubility to assess the reliability of axis II Although the intent of this study was to include axis II, the body system involved in the patient’s pain problem, reliability data on this axis will not be presented because preliminary screening of patient records at the Center for Pain Evaluation and Treatment indicated that a very large proportion of patients was diagnosed as having pain associated with the musculoskeletal system. There were not a sufficient number of patients whose pain was associated with other body systems for the reliability of this axis to be adequately assessed (insufficient variability). Determination of the reliability of axis II will have to await future studies. Results IASP axis I - primary site of pain In order to test whether the primary different raters under different conditions site of pain could be reliably judged by (i.e., chart review vs. direct examination

IX2

183

of the patient, and physician vs. physical therapist) separate kappas were computed for each of the 4 rater pairings. All kappa analyses were calculated with a Fortran computer program developed by Cichetti et al. [6]. These analyses can be summarized as follows: (1) The kappa analysis of the chart review data provided by 2 physicians indicated that they agreed 93.3% of the time as to the primary site of the patient’s pain. Kappa was computed to be 0.89, z = 9.006, P < 0.00001. (2) The 2 physicians who directly examined the patient were in agreement 83% of the time and kappa was computed to be 0.79, z = 10.57, P < 0.00001. (3) The 2 physical therapists who also directly examined the patient were found to be in agreement 85% of the time, K = 0.79, z = 6.54, P < 0.00001. (4) Finally, the physical therapist and the physician who directly examined the same patient were found to be in agreement 90% of the time, K = 0.84, z = 7.72, P < 0.00001. Additionally, the kappa values obtained using these 4 different rating methods were not significantly different, x2 (3) = 0.798, P = NS, suggesting that neither the method of rating nor raters produced differences in reliability and thus justifying the collapsing across raters and methods for subsequent analyses of axis I. The overall kappa value for IASP axis I was computed to be 0.796 (z = 17.01, P < O.OOOOl), indicating that this scale demonstrated excellent reliability. Thus, these findings provide strong support for the reliability of IASP axis I. Although the above reliability findings are encouraging, the overall reliability of a scale does not indicate if each of the categories within a scale is equally reliable. Specifically, the overall reliability of raters coding the primary site of pain does not indicate whether they were, for example, as reliable in judging the pain to be primarily in the cervical area as compared to their judgments that the primary site of pain was in the low back region. The overall reliability may be viewed as the average reliability of each of the categories that comprise the axis. To assess the reliability of each primary site of pain contained within the axis I scale, the data from the 4 rater pairings were combined and separate kappas were computed for each category [9]. Table II, which is based on 230 rater judgments (2 ratings for each of the 115 patients), contains the frequency that each primary site of pain category was selected as well as the results of the separate kappa analyses on each category. As can be seen in Table II, the low back region was coded most frequently (46.5% of the ratings) and the pelvic region occurred least frequently (0.4% of the ratings). Also contained in Table II is the index of rater agreement for each category in axis I. The separate indices for each pain site indicate: (a) interrater agreement rate, i.e., the proportion of the time the raters agreed that the pain was at that specific site; (b) based on the probabilities for each site, the proportion of agreement that would be expected for that specific site based on chance alone; and (c) the corresponding kappa reliability index for each site (note that for pain sites that had an extremely low incidence rate the obtained kappa values can only be viewed as suggestive). As can be seen in Table II, overall these raters were in agreement 85.2% of the time as to the patient’s primary site of pain, in contrast to the 27.4% expected by

183

184

TABLE II CATEGORY RELIABILITY Category OF IASP AXIS I - No. of times selected 34 33 PRIMARY SITE OF PAIN Average usage a Index of rater agreement Obtained Expected Kappa 0.148 0.144 0.824 0.667 0.147 0.142 0.793 0.611 Excellent Good Poor Fair Excellent Excellent Excellent h Head Cervical Upper shoulders or limbs Thoracic Abdominal Lower back Lower limbs Pelvic Anal or genital 12 7 9 107 24 1 3 0.052 0.030 0.039 0.465 0.104 0.004 c 0.013 0.333 0.571 0.889 0.972 1.000 0.039 0.030 0.039 0.465 0.104 0.306 0.558 0.884 0.948 1.000 0.667 0.012 0.663 ’ Overall 230 1.000 0.852 0.274 0.796 Level of clinical significance Excellent a Proportion of the number of times category was selected. h Proportion of obtained interrater agreement, proportion expected by chance, and chance-corrected agreement, kappa. ’ The computation of kappa is not possible for this category because it has an expectancy rate of zero. ’ This kappa value is only suggestive due to the extremely low frequency of occurrence of this category. chance. In terms of specific pain sites they displayed 100% agreement in terms of the lower limbs, and also displayed good to excellent interrater agreement and kappa values for head, cervical, abdominal, and low back sites [5]. These categorical kappa analyses indicated, however, that these raters had difficulty agreeing that the upper shoulders or limbs or the thoracic region was the patient’s primary site of pain. For example, as displayed in Table II, there was only 33.3% agreement that the upper shoulders or limbs was the patient’s primary pain site. In sum, overall the IASP axis I appears to have excellent reliability. Upon closer inspection, however, it appears that several categories within this axis are not very reliable and may need to be refined or several categories may need to be combined to establish adequate reliability. Finally, the incidence rate of several categories within this scale, pelvic and anal or genital areas, was extremely low in these 2 pain populations and thus the determination of the individual reliability of these categories needs further assessment. IASP axis V - etiology of the pain problem In order to test whether the etiology of the pain problem could be reliably rated of the under the 2 assessment conditions (i.e., chart review and examination patient), separate kappas were computed. The kappa analysis of the chart review data provided by 2 physicians indicated that they agreed 70% of the time as to the etiology of the patient’s pain problem. The kappa for this analysis was 0.49, z = 4.39, P < 0.00001. The 2 physicians who directly examined the patient were in agreement 65.7% of the time and kappa was computed to be 0.50, z = 5.98,

184

185

P < 0.00001. As expected, these 2 kappa values were not significantly different, x2 (1) = 0.721, P = NS, and the overall kappa in terms of the patient’s pain etiology was found to be 0.50 (z = 7.63, P < 0.00001). Although these findings supported the hypothesis that pain etiology could be reliably rated greater than chance would predict, the clinical significance of the reliability index can only be considered to be fair [5]. As with the primary site of pain data, separate kappas were computed for each etiology category to determine whether some etiological categories were more reliable than others. The results of these analyses are displayed in Table III. As can be seen in Table III, the overall observed rater agreement rate was found to be 67.7% in contrast to the 35.3% agreement rate expected by chance. As displayed in Table III, traumatic etiology was selected most frequently (56.2% of the time). Inspection of the kappa values for each etiology indicates that only genetic, trauma, and infective were found to have at least good reliability, i.e., kappas greater than 0.60. On the other hand, inspection of the observed agreement rates and their corresponding kappa coefficients indicated that considerable interrater differences existed for the degenerative, dysfunctional, and unknown categories. Examination of the frequency cross-tabulations for the degenerative, dysfunctional, and unknown categories indicated that the raters were in disagreement about coding a patient’s pain etiology into one of these 3 categories. In addition to confusion between these categories, confusion between these etiologies and others, especially trauma, was observed. For example, 55.5% of the disagreements for the degenerative coding involved the trauma coding. In other words, these physicians TABLE III CATEGORY Category Genetic Trauma Infective Inflammatory Neoplasm Toxic Degenerative Dysfunctional Unknown Psychological Overall RELIABILITY No. of times selected 4 13 4 1 1 2 15 20 10 0 130 OF IASP AXIS V Average usage a PRESUMED ETIOLOGY Index of rater agreement b Obtained Expected Level of clinical significance 1.000 c Kappa 0.031 1.000 0.031 0.562 0.031 0.008 d 0.008 d 0.849 1.000 0.556 0.031 0.660 1.000 = Good 0.015 0.000 0.015 - 0.016 = 0.115 0.154 0.077 0.000 d 0.400 0.500 0.200 0.090 0.152 0.065 0.341 0.410 0.145 Poor Fair Poor 1.000 0.677 0.353 0.500 Fair a Proportion of the no. of times category was selected. b Proportion of obtained ‘interrater agreement, proportion expected by chance, and chance-corrected agreement, kappa. 1 This kappa value is only suggestive due to the extremely low frequency of occurrence of this category. The computation of kappa is not possible for this category because it has an expectancy rate of zero.

185



Flipbook Gallery

Magazines Gallery

Catalogs Gallery

Reports Gallery

Flyers Gallery

Portfolios Gallery

Art Gallery

Home


Fleepit Digital © 2021