Missing data are ubiquitous in clinical research.1 This is true in trauma research, whether in a prospective trial or a retrospective database study. Unfortunately, missing data are often handled improperly or outright ignored, with detrimental consequences for the accuracy of study results. The problem arises when the likelihood of missingness to key variables is associated with other observed or unobserved variables. For example, if missing data are more common among patients with trauma with high mortality risk, simple case deletion will introduce a systematic error or bias. Conversely, missingness might obscure important associations that support or refute the proposed causal mechanism being studied.
Donohue et al directly confront this problem in their secondary analysis of the Study of Tranexamic Acid During Air Medical and Ground Prehospital Transport (STAAMP) trial.2 Tranexamic acid (TXA) administration is associated with survival in injured patients at risk of hemorrhage.3 4 However, no trials have found an association between TXA and improved coagulation as measured by thromboelastography (TEG)—the proposed mechanism for this effect. Donohue et al examine the role of missing TEG data. Patients with missing TEG were significantly more likely to present in severe shock (SBP<70 mm Hg) and go on to die (36% vs 7%), reflecting the logistical challenges of sampling patients in extremis. On analysis of patients in the severe shock subgroup with TEG collected, TXA was significantly associated with decreased LY30, indicating improved clot stability. These findings support the proposed mechanism behind the survival benefit of TXA among patients with severe shock in the STAAMP trial. Moreover, they provide a case study of how missing data hinders the interpretability of clinical studies.
When confronted with missing data, investigators should perform exploratory analyses to understand what pattern of censoring exists. There are typically three patterns.1 When data are missing completely at random (MCAR), the probability that a value is censored is completely independent of both observed and unobserved characteristics. Using the example of the STAAMP trial, this would occur if TEG parameters were missing due to sporadic laboratory errors. When data are missing at random (MAR), missing values are systematically related to observed, but not unobserved, characteristics. This would occur if the patient’s degree of extremis is related to the probability that TEG is not collected, rather than the TEG parameters themselves (eg, LY30). Finally, data missing not at random (MNAR) are systematically censored in a pattern related to the unobserved characteristics. This would occur if patients with abnormal LY30 (the unobserved characteristic) were less likely to undergo TEG collection due to challenges in the setting of severe shock. Missing data are rarely MCAR. Therefore, case-wise deletion of observations with censored values is rarely appropriate. Multiple imputation (MI) is a methodology that estimates missing values based on existing correlations between known values and other observed variables.5 This facilitates analyses using all known data points while also accounting for the uncertainty in unknown data points, thus avoiding overly precise estimates. When data are MAR or MNAR, MI is felt to introduce less bias than more naïve methods and is the approach used by the Trauma Quality Improvement Program of the American College of Surgeons.6
In summary, missing data are common in trauma research. Investigators should proactively seek to understand the pattern of missingness in their data and report a statistical approach to handling it to minimize bias. Thoughtful analyses, as Donohue et al have done, can help to illuminate the impact of missing data on clinical observations.