Previous Article | Next Article ![]()
Clinical and Diagnostic Laboratory Immunology, November 2003, p. 1029-1036, Vol. 10, No. 6
1071-412X/03/$08.00+0 DOI: 10.1128/CDLI.10.6.1029-1036.2003
Copyright © 2003, American Society for Microbiology. All Rights Reserved.
Department of Statistics and Institute for Health, Health Care Policy and Aging Research,1 Environmental and Occupational Health Sciences Institute, Rutgers University, Piscataway, New Jersey,7 Department of Environmental Health Sciences,3 Department of Molecular Microbiology and Immunology,5 Johns Hopkins Bloomberg School of Public Health, and MCS Referral and Resources, Baltimore, Maryland,2 Emmitsburg, Maryland,4 Departments of Laboratory Medicine and Medicine, University of Washington, Seattle, Washington,6 Department of Pathology, Scripps Research Institute, La Jolla, California,8 Division of Laboratory Sciences, Centers for Disease Control and Prevention, Atlanta, Georgia,9 FAST Systems, Inc., Gaithersburg, Maryland,10
Received 12 February 2003/ Returned for modification 10 April 2003/ Accepted 2 July 2003
| ABSTRACT |
|---|
|
|
|---|
3%. Interlaboratory differences were statistically significant for all T-cell subsets except CD4+ cells, ranging from minor to eightfold for CD25+ subsets. Within laboratories, the date of analysis was significantly associated with the values for all cellular activation markers. Although reproducibility of autoantibodies could not be precisely assessed due to the rarity of abnormal results, there were inconsistencies across laboratories. The effect of shipping on all measurements, while sometimes statistically significant, was very small. These results support the reliability of fresh and shipped samples for detecting large (but perhaps not small) differences between groups of donors in the T-cell subsets tested. When comparing markers that are not well standardized, it may be important to distribute samples from different study groups evenly over time. | INTRODUCTION |
|---|
|
|
|---|
Hyperreactivity of the immune system to environmental stimuli could explain both the diversity of symptoms in MCS and the very low levels of chemical exposures with which those symptoms have been associated. This hypothesis has been investigated in case series and controlled studies (12, 14, 18, 20, 26-28, 32, 34, 36) (for a review, see reference 24), but many of these studies have been controversial and/or difficult to interpret, for at least two reasons. First, the reliability of many of the immunological methods and tests used has not been demonstrated by standard epidemiological and laboratory criteria. While markers used for the diagnosis or management of known immunological diseases such as human immunodeficiency virus infection are now routinely validated and quality controlled (13, 17), this is not true for many of the immunological markers studied for MCS (J. B. Margolick and R. F. Vogt, Letter, Ann. Intern. Med. 220:249, 1994), particularly those related to lymphocyte phenotype and function. For example, one study (32) that found no immunological abnormalities in people with MCS generated considerable controversy, in part because it used methods whose reproducibility was questionable (31; Margolick and Vogt, letter). Second, diagnostic criteria and epidemiological case definitions of MCS have been inconsistent across studies, and many studies did not consider the possibility that some of the controls could have had MCS; this could be important given that up to 16% of those surveyed in recent population studies indicated that they had some degree of hypersensitivity to environmental chemicals (3, 15).
For immunological testing to be useful either in understanding the pathogenesis of MCS or, potentially, in its diagnosis, tests that reliably distinguish immunological differences of the magnitudes that might exist between people with and without MCS, and laboratories which can perform these tests reliably, must be available. The goal of this study was to address the reliability of some immunological methods and tests that have been used to investigate and evaluate MCS. To this end, we conducted a multiple-laboratory comparison of immunological markers commonly cited in the MCS literature. In particular, we evaluated the magnitudes of between- and within-laboratory differences in these measures and whether differences between individuals with and without well-defined immunological diseases could be identified. We focused primarily on cellular immunity, using measurements that have been widely studied in both MCS and related diseases such as chronic fatigue syndrome (16, 32), i.e., the major T-cell subsets (T-helper [CD4+] and T-cytotoxic [CD8+] cells) and cellular activation markers expressed by these cells. We also studied certain autoantibodies that have been reported to be elevated in persons with MCS, namely, antibodies to smooth muscle, myelin, and thyroid antigens (12, 32).
To evaluate the reproducibility of the tests as they would likely be performed in the assessment of people for possible MCS, replicate blood samples were tested in four to six laboratories, to which they were shipped from Baltimore, Md., by overnight mail. In addition, the effect of the shipment process on the collective results was examined by analyzing fresh samples in Baltimore on the day of phlebotomy.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Selection of tests and laboratories. The cellular activation markers studied included CD25, CD26, and HLA-DR, which have been analyzed in past studies of MCS (12, 32, 36), and CD38, an activation marker which has not previously evaluated in MCS but which has been studied extensively in immune diseases, including AIDS (19) and chronic fatigue syndrome (16). Antibodies to smooth muscle, thyroid gland, and myelin (12, 32) were also studied. As previous studies of immunological testing in MCS used both commercial and research laboratories, we included two commercial laboratories that had been active in testing for MCS (Immunosciences, Inc., Los Angeles, Calif., and Specialty Laboratories, Inc., Los Angeles, Calif.) and four research-oriented laboratories, located at the Johns Hopkins School of Public Health (JHU) (Baltimore, Md.), Rutgers University (Piscataway, N.J.), Scripps Research Institute (La Jolla, Calif.), and the University of Washington (UW) (Seattle), that had longstanding interests in laboratory quality control for serological testing (4) and immunophenotyping. The JHU and UW laboratories participate in a flow cytometry proficiency control program sponsored by the National Institute of Allergy and Infectious Diseases, and the JHU and Rutgers laboratories had participated in the Immune Biomarkers Demonstration Project (35). One other commercial laboratory was approached but declined to participate.
Processing of specimens for laboratory testing. Approximately 50 ml of blood was drawn from each study subject and processed as follows: 18 ml was drawn into a heparinized syringe and divided into 12 1.5-ml aliquots for lymphocyte subset analysis, and 30 ml was drawn into 10 3-ml serum separator tubes for autoantibody analysis. On the day the blood was drawn, two (duplicate) aliquots of heparinized blood from each person were sent to the flow cytometry laboratory at JHU for immediate (within 4 h) analysis of T-cell phenotypes, and two (duplicate) aliquots of serum were similarly sent to the autoantibody laboratory at JHU, according to protocols described below. Ten aliquots of heparinized blood and eight serum tubes were then transported by courier to FAST Systems, Inc. (Gaithersburg, Md.), for shipment in duplicate by overnight mail to all participating laboratories (including back to JHU), as illustrated schematically in Fig. 1. In one instance difficult venous access limited the amount of blood that could be drawn, and only one specimen was sent to each laboratory.
|
For T-cell immunophenotypic analysis, the commercial laboratories (Immunosciences and Specialty Laboratories) used two-color analysis. The research laboratories (JHU, Rutgers, and UW) used a three-color antibody panel that was designed for this study and was configured by PharMingen Laboratories, San Diego, Calif. (now part of Becton-Dickinson Immunocytometry Systems, San Jose, Calif.). The panel included all antibodies studied here except anti-CD38. Antibodies were conjugated to either fluorescein isothiocyanate, phycoerythrin, or cychrome in the following combinations (fluorescein isothiocyanate/phycoerythrin/cychrome): HLA-DR/CD38/CD4, HLA-DR/CD38/CD8, CD25/CD26/CD4, and CD25/CD26/CD8. An anti-CD38 antibody from Becton Dickinson was added to appropriate antibody panel tubes when the specimen was stained, based on preliminary studies in which it gave superior staining compared to the PharMingen anti-CD38 antibody. The three-parameter data from JHU, Rutgers, and UW were reduced to the equivalent two-color analyses by using software supplied with the flow cytometers (ELITE cytometers and software; Beckman Coulter, Miami, Fla. [in all three laboratories]). As three-color data are not used in clinical practice, these data were not analyzed here. One laboratory measured expression of CD25 and CD26 only on CD3+ lymphocytes rather than on CD4+ or CD8+ lymphocytes. Data are reported as percentages of gated lymphocytes (identified by forward- and side-scatter gating at the three research laboratories).
For analysis of the autoantibodies selected for study (antithyroid, anti-smooth muscle, and antimyelin), replicates were shipped and fresh and frozen samples were compared by procedures similar to those described above for T-cell subsets (Fig. 1). The only difference was that performance of these measurements was batched in the research laboratories for both the fresh and frozen samples; i.e., samples were stored frozen until a sufficient number of samples were available for batch analysis. At the commercial laboratories, samples were tested as received. The comparison of fresh versus shipped samples at JHU was made between samples that were shipped without being frozen, but were then frozen when received, and samples that were frozen in the processing laboratory before being shipped. At each laboratory, serum was incubated with appropriate tissues, according to standard methods for that laboratory, using either freshly frozen serum or serum frozen after shipping.
Data analysis. Summary statistics were analyzed by medians, means, standard deviations, and ranges. Box plots compared results by laboratory for each test. Since the distributions of measurements for antibodies were quite extreme, with the vast majority of person-tests having no antibodies detected and a few person-tests having very large values, descriptive statistics and more formal statistical comparison of these measures were not practical. However, as T-cell measures had more stable distributions, formal statistical comparisons of inter- and intralaboratory and intersubject variabilities for each T-cell marker were possible. The following (potential) effects on T-cell measures were tested through nested random/fixed effects analysis of variance models: (i) interlaboratory variation, (ii) interday variation (i.e., between date of analysis) within the same laboratory, (ii) difference between fresh versus shipped samples at JHU, (iv) variation due to disease group (healthy individuals, MCS, or other diagnosed autoimmune diseases), and (v) within-disease-group interperson variation. Specifically, two different models were fit, as described in the appendix.
As this was exploratory research, adjustments for multiple comparisons were not made (29), although for many of the comparisons reported, P values were so low (<0.0001) that this issue would not arise. Distributions of most of the laboratory measures were skewed to the right. However, logarithmic and square root transformations of these measures resulted in distributions that were skewed to the left. As residuals from the multivariate models fit to nontransformed measures were not skewed (or at least not more skewed than residuals of models fit to transformed measures), no transformations were made. In addition, the results of these analyses were not meaningfully affected by exclusion of extreme values.
| RESULTS |
|---|
|
|
|---|
T-cell measures. (i) Reproducibility of replicates. Table 1 presents distributions of the absolute values of differences between T-cell measures among the replicate samples analyzed at the five laboratories, including the shipped samples (analyzed at all laboratories) and the fresh replicates (analyzed at JHU). The overall mean for all samples for each phenotype is also given, for comparison to the absolute differences. Except for one phenotype (CD26+ CD3+), the mean absolute differences between replicate samples analyzed at the same laboratory and condition were <2 percentage points, and the median differences were generally 1% or less. Moreover, as can be seen from the 90th percentile figures, for all phenotypes the vast majority of replicates analyzed at the same laboratory were within 3 percentage points of each other, which is considered to be acceptable variation among flow cytometric measurements of replicate specimens (5). This close agreement was observed even for dim markers such as CD25 and CD26, which are considered to be difficult to measure. The closeness of the replicates provides a firm basis for making the statistical inferences described below. No gross differences among laboratory performance in replicate reproducibility were evident, although we did not have the power (with only 40 subjects with replicates analyzed at each laboratory) to evaluate this statistically.
|
|
|
20% greater than the lowest laboratory mean. This level of agreement across all laboratories in measurement of CD26 expression was surprisingly good, as there are no reference standards for this measurement. For the CD38+ CD4+, DR+ CD4+, and CD26+ CD8+ phenotypes, the between-laboratory variation was somewhat greater, with differences of 34 to 51% between the highest and lowest laboratory means, and for the DR+ CD8+ and CD38+ CD4+ phenotypes, the ratio of largest to smallest laboratory means was >2. Generally, the ratios of largest to smallest laboratory means were highest in the activated T-cell subsets, especially CD8+ DR+ lymphocytes and CD25+ cells, among all T-cell subsets. Specifically, the ratio of largest to smallest laboratory mean was up to 11.4 for the CD25+ CD4+, CD25+ CD8+, and CD25+ CD3+ phenotypes. Much of this variation was due to one laboratory (laboratory A in Table 3), which obtained CD25+ values that were lower than those of the other laboratories and were implausible based on reported studies of T cells from healthy humans (25, 37). However, even if data from this laboratory were omitted, the largest laboratory means for phenotypes including CD25+ were 3.5- to 4.6-fold greater than the smallest laboratory means. (iv) Disease group comparisons. Table 4 presents T-cell phenotype summary statistics for all samples tested (across day and laboratory) for healthy subjects and those in the group with immunological disease. We have not presented summary statistics for individuals identified with MCS because these cases were not clinically confirmed, and a study with confirmed MCS cases will be reported separately. In Table 4, however, P values for differences among all three disease groups are reported (consistent with the models fit). Disease group differences (among all three groups) were statistically significant for the CD8+, CD25+ CD3+, CD26+ CD8+, CD38+ CD8+, and DR+ CD8+ phenotypes. The magnitudes of the differences between healthy individuals and those with immunological diseases in Table 4 were generally far smaller than the magnitudes of laboratory differences in Table 3. Except for the DR+ CD8+ phenotype, where the mean for individuals with immunological disease (10.6%) was 2.7-fold greater than that for healthy individuals (3.9%), the ratio of the mean of the higher group to that of the lower group was never more than 2.
|
For all of the measures, both the nested patient effects (45 patients nested in three disease groups) and the date-of-analysis differences (85 dates nested in 17 laboratories) were highly statistically significant (P < 0.0001, with large F values), strongly suggesting that these effects were statistically real. However, the fact that the study design was unbalanced with respect to date-of-analysis and patient effects (because the same sampled patients were always nested together in the same analysis date across all five laboratories) made it impossible to completely separate, and quantify the magnitude of, the patient effect and the date of analysis effect.
Analysis of autoantibody data. In most cases, the antibodies tested were not detected or were within the laboratory's normal limits, with occasional large values for some subject-visits. The reproducibility of the tests was again very good on the replicates run by all laboratories. Antibody tests determined by titer (i.e., dilution of the specimen serially until the reactivity can no longer be detected) conventionally are allowed to differ by up to two serial dilutions if twofold dilutions are done. The laboratories participating in this study had this level of reproducibility for almost all specimens (data not shown). However, the prevalences of autoantibodies that were detectable were much less than anticipated in all study groups, even in the group with immune diseases, perhaps because these patients were generally being treated (with glucocorticoids and/or other immunosuppressive agents for systemic lupus erythematosus and with thyroid-suppressive therapy for Graves's disease). Overall, one laboratory had the greatest number of positive tests, detecting abnormal autoantibody levels in six subjects, mostly in the immunologically abnormal group. Only one laboratory found detectable antimyelin antibody in any of the specimens, and these results were highly reproducible in that laboratory. Still, virtually all of the results for this antibody were in the stated normal range. Any effects of shipment on the antibody titers or concentrations measured were very small and not clinically important, with titers being reduced by one dilution at most in some tests.
| DISCUSSION |
|---|
|
|
|---|
The basic findings for T cells were reassuring. Participating laboratories generally had very strong within-date-of-analysis reproducibility, as reflected in the close agreement with the vast majority of replicate samples. Similarly, shipping of blood by overnight express mail did not materially affect any of the measures; while some statistically significant differences between the means for fresh and shipped samples were observed, the magnitudes of these differences were small. The effect of shipping has not been well studied for most of these markers, although the major T-cell subsets (i.e., CD4+ and CD8+) are known not to be affected by shipping as long as the shipping time and temperature are not extreme. The variability of expression of CD38 and HLA-DR expression by shipped CD8+ T cells analyzed in several laboratories was found to be substantial in one study (17), although fresh unshipped samples were not analyzed in that study. Those authors emphasized the need for quality control in the analysis of shipped specimens. The findings of the present study extend this by directly supporting the validity of using individual laboratories to analyze shipped samples of whole blood for the immunological markers assessed in this study, if the analysis is carefully done and validated, as described below.
The picture with respect to interlaboratory agreement was more mixed. Overall, there was good agreement on most of the measurements performed, including many which might have been expected to be more variable, such as those of CD26, HLA-DR, and CD38. However, variability was much greater across laboratories than within laboratory replicates for these markers, and for CD25 the between-laboratory variation was very wide indeed, due in part, although not entirely, to an apparently consistent falsely low determination of this marker in one laboratory. Nevertheless, taken as a whole, these data support the value of laboratory test validation and quality control in studies of MCS and other diseases. They also emphasize the need for the careful selection and validation of tests to be performed, since test validation may require substantial effort. In this connection, we have employed the markers that were validated for shipping and multilaboratory analysis in the present study in a separate, clinically rigorous study to determine if the T-cell subsets identified by these markers actually distinguish persons with and without MCS, in terms of either percentages of lymphocytes (as done in this study) or absolute cell counts. The results of that study will be reported separately (C. S. Mitchell et al., unpublished data).
For all measures, there was statistically significant within-laboratory date-of-analysis variation. While the magnitude of this variation was difficult to quantify, it appeared to be smaller than the between-laboratory differences. This source of variation can influence statistical comparisons among groups if it is not incorporated into the analysis and the groups are not balanced across dates. Indeed, Simon et al. (32) observed that a (systematic) laboratory measurement trend in interleukin-1 generation at a single laboratory in their study of MCS influenced comparisons of disease groups, because a higher portion of one disease group was tested earlier in the study. Although the phenomenon of within-laboratory date-of-analysis variation has not been well studied, potential reasons for within-laboratory measurement differences by date of analysis include (i) day-to-day variation in sample processing, (ii) variation in laboratory practices by the technician (or by different technicians on different days), and (iii) changes in ambient temperatures and other conditions for shipping and storing samples. Moreover, consistent performance over one time period, even one as long as the 7 months of this study, does not guarantee long-term consistency.
This potential effect of date of test, together with potential intraday collinearities of samples processed and analyzed together, complicated the statistical analysis in this study. In particular, associations due to disease group (control, MCS, or autoimmune disease) could not be easily separated from associations due to date of analysis, because in many cases all individuals analyzed on a given day were from the same disease group. For the same reason, interaction between laboratory and disease group could not be tested, because higher measures for one disease group at a specific laboratory could have been be mediated by date-of-analysis effects in ways that were would be difficult to include in a linear model.
These considerations suggest that studies with several treatment or patient groups should maintain a temporal balance in the distribution of samples from the disease groups tested, with (roughly) the same portion of samples from each disease group tested at each date of analysis. While this may be logistically difficult, other approaches, such as incorporating date of analysis into multivariate comparisons (as was done by Simon et al. [32]) or running known standards in laboratory testing at regular time intervals, may help to mitigate the influence of date-of-analysis effects on disease group comparisons. Further, given that laboratory performance may change over time, the participation of laboratories in ongoing quality assurance programs (13, 17) is important, and such participation should be described in studies of these markers. Where no quality assurance program is available (e.g., for exploratory studies of new markers or markers that have not been well standardized across laboratories), it may be desirable to have the measurements performed in more than one laboratory. If cost permits, duplicate samples from the same individual should be stored and tested on different days to improve the precision of estimates and identify components of variance. Further research to characterize and quantify the intralaboratory date-of-analysis effect that was observed in the present study may help in development of techniques to minimize its influence on study results.
The data presented here have some important limitations. Not all immunological markers that may be pertinent to MCS were studied. Although no major effects of shipping on autoantibody measurement were detected, the number of specimens with detectable levels of autoantibodies was smaller than expected. Therefore, further studies will be needed to determine the magnitude and importance of shipping and laboratory variations in these measurements. The fact that substantial variation was observed across laboratories even in this small number underscores this need.
Understanding whether immunological mechanisms play a role in MCS, or in subsets of individuals diagnosed with MCS, remains an important research goal. The results of this study support the potential ability of the immunophenotypic tests evaluated to detect strong associations between immunological parameters and diseases of interest, such as MCS. At the same time, however, they show that in order to have complete confidence in studies of this nature, attention must be paid to quality assurance for all tests conducted, including balanced testing of subjects in different disease groups across different dates.
| APPENDIX |
|---|
|
|
|---|
a2; bij(ai) is the random effect of day j nested within laboratory i, which has mean 0 and variance
b2; ck for k = 1,2,3 is the effect of disease group k with
ck = 0; dkl(ck) is the random effect of patient k nested within disease group l, which has mean 0 and variance
d2; and eijklm is the random effect of replicate m nested within all of all the other groups, which has mean 0 and variance
e2. Note that
e2 consists of both sampling error and within-day laboratory testing error.
Since the design was not balanced (primarily due to the inseparability of date-of-analysis effect from disease group effect), hypotheses were tested by using type III sums of squares, which tend to conservatively exclude nonidentifiable variance from directed sums of squares. Hypotheses about day effect (
b2 = 0) and patient effect (
d2=0 = 0) were tested by F tests with comparison of type III sums of squares for days and patients to type III sums of squares for residual error
e2 (30). Hypotheses for the group effect being 0 (ck
0) were tested by F tests comparing type III sums of squares for groups to type III sums of squares for nested patients. Hypotheses for the laboratory effect being 0 (
a2 = 0) were tested by F tests comparing type III sums of squares for laboratory to type III sums of squares for nested days.
A simpler model was fit to test the effect of fresh versus shipped samples evaluated at JHU. All terms were the same as in the previous model, except that ai corresponds to fresh (i = 1) versus shipped (i = 2) with
ai = 0. In order to obtain a balanced design, we excluded a disease group effect from the model and restricted inclusion to the 40 (of 45) patients who had both fresh and shipped samples tested for each laboratory parameter. We further reduced the F test to a paired t test of differences of corresponding date-of-analysis means to facilitate construction of confidence limits.
| ACKNOWLEDGMENTS |
|---|
This study was funded by a grant from the Washington State Department of Labor and Industries. Immunosciences, Inc., and Specialty Laboratories, Inc., provided discounted rates for the tests performed in this research. D.R.H. was supported by NIH grant MH43450 and NSF grant EIA 02-05116.
| FOOTNOTES |
|---|
| REFERENCES |
|---|
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Antimicrob. Agents Chemother. | Clin. Microbiol. Rev. | Infect. Immun. |
|---|---|---|
| J. Clin. Microbiol. | J. Virol. | ALL ASM JOURNALS |