Systematic Review

Accuracy of plain radiography in detecting fractures in older individuals after low-energy falls: current evidence

Abstract

Background Older individuals sustaining low-energy falls (LEF) and presenting to the emergency department (ED) demand straightforward diagnostic measures for injury detection. Plain radiography (XR) series for diagnosis of fall-related injuries are standard of care, but frequently subsequent CT examination is required for diagnostic assurance. A systematic database search of diagnostic accuracy of XR for detection of fractures in older LEF patients was performed.

Methods We searched PubMed, Embase, Cochrane Library, WHO International Clinical Trial Platform, and Clinical trials.gov databases from inception to January 2020 for studies including older patients (≥65 years) with LEF and obtaining CT examination and XR of the skeleton in an ED setting.

Results From 8944 references screened, 11 studies met the criteria for inclusion. Performance of XR for detection of fractures of the pelvic ring and hip was analyzed in nine studies, two studies investigated XR performance to detect rib fractures, and two studies compared diagnostic accuracy of thoracolumbar spine XR. Sensitivity estimates ranged from 10% to 58% and specificity estimates from 55% to 100%. Clinical and statistical heterogeneity was significant among included studies, with an overall considerable risk of bias.

Discussion High-quality evidence on accurate imaging strategies in older patients with LEF is lacking to date. XR is missing a reasonable amount of fractures of the pelvic ring, rib cage, and thoracic and lumbar spine. However, the utility of first-line CT imaging and the benefit of diagnosing every fracture is unknown, demanding high-quality prospective trials considering patient-oriented outcome as well.

Background

Low-energy falls (LEF) are defined as a fall from standing height or less and include falls while transferring, sitting or from the bed. They are one of the most common reasons for emergency department (ED) presentation in older patients and visit rates are increasing.1 LEF are the leading mechanism of fatal and non-fatal injury in individuals aged 65 years or older in developed countries2 3 and are associated with significant morbidity and mortality that appear to increase with age.4 5

A recent comprehensive trauma registry data analysis emphasized that LEF is the predominant trauma mechanism in older individuals and leads to injuries as severe as those caused by high-energy mechanisms in younger patients.6 LEF is associated with significant morbidity and mortality, which appear to increase with age.4 These patients are jeopardized by underestimation of the trauma mechanism and by lack of early identification of potentially severe injuries during the course of clinical management.6

Liberal use of pan-scan CT for injury detection in older trauma patients has been recommended,7 but there is still a paucity of evidence to support this. Evidence regarding imaging strategies in older patients with LEF is even scarcer. Reports of serious mismanagement such as delayed diagnosis of entire injury patterns,8 life-threatening hemorrhage from missed low-energy fractures of the pelvic ring,9 10 or predisposition to highly unstable spine injuries due to ankylosing spondylitis,11 12 further urge a critical appraisal.

In standard practice, advanced imaging studies such as CT are mainly conclusive in patients when plain radiography (XR) findings are equivocal or inconsistent with clinical suspicion. In general, the inclusion of imaging in the ED constitutes a major risk factor for prolonged ED length of stay (LOS) >4 hours for older individuals,13 which is subsequently associated with an increased risk for hospital admission and 30-day-readmission, increased hospital LOS, and increased in-hospital mortality.13

The objective of this systematic review was to provide a summary of the current evidence and an estimation of the accuracy of XR in fracture diagnosis in older LEF patients compared with CT examinations.

Methods

Search strategy

A systematic database search of Pubmed (inception to May 2019, update search January 2020), Excerpta Medica dataBASE (EMBASE,inception to March 2019, update search January 2020), Cochrane Library, WHO International Clinical Trial Platform (ICTRP), and ClinicalTrial.gov was conducted to identify studies that compared XR to CT for detection of fractures in older individuals (≥65 years) with LEF according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement recommendations.14 The PICO (Population, Intervention, Comparison, Outcome)15 analysis was used to break down the objectives and define the search strategy (table 1). Search terms were combined using the Boolean operators ‘AND’ and ‘OR’. Detailed database search terms are provided in online supplemental table 1. Additional articles were identified by hand search of bibliographies of relevant studies.

Table 1
|
Research question according to PICO criteria

Retrieved records were merged and duplicates removed. We sought English and German language studies that evaluated emergency imaging techniques in detecting injuries of the head, cervical spine, axial skeleton (vertebral column, rib cage, sternum), and pelvic ring in elderly and older patients sustaining an LEF. Trauma registry data studies and conference abstracts were included. Studies with unspecified age or high-energy trauma mechanism of the targeted population, studies not including CT imaging in ED or not comparing XR with CT in the same region, case reports, and narrative reviews were excluded. Abstracts and full-text articles were screened by two independent reviewers (VP and AL); discrepancies were resolved by discussion.

Data extraction and quality assessment

A data extraction form was developed to report a full description of the study, including study design, setting and duration, inclusion criteria of patients, trauma bay admission, sample size and baseline demographics of included patients (gender and mean/median age were available), imaging modalities under investigation, prevalence of injuries (including 95% CI were available), outcome measures, and authors conclusion. Primary data extraction was performed in duplicate (VP and AL) and finally merged by mutual agreement. The overall methodological quality of studies included was assessed independently by two individual observers (VP and AL) using the Newcastle-Ottawa Scale Questionnaire16 for cohort studies. Studies were graded as ‘good’ if total score was 8, ‘fair’ with total scores of 6–7 and ‘poor’ if total score was 5 or less. Discrepancies were resolved by discussion. The validated Quality Assessment of Diagnostic Accuracy Studies (QUADAS) tool17 18 was applied additionally for assessment of the potential biases of the included studies of diagnostic accuracy. Accordingly, we defined XR examination as ‘index test’ and CT examination as ‘reference test’.

Quantitative and statistical analysis

Measures of diagnostic accuracy were sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), and negative likelihood ratio (LR), and the accuracy (diagnostic effectiveness) with CT set as reference standard. Quantification of statistical heterogeneity among the studies was calculated with the χ2-based Cochran’s Q and Higgins I2 statistics for each summary estimate (chest, pelvic ring, spine); significant statistical heterogeneity was defined as p<0.05 or I2 >50%. The ProMeta V.3.0 software package (Internovi 2015) was used for summary estimates and summary statistics.

Results

Study selection and study characteristics

A total of 8944 records were identified by the systematic database search. After adjusting for duplicates, 6250 records were excluded by screening of titles and abstracts, independently reviewed by two reviewers (VP and AL). Forty-one studies were eligible for full-text review (figure 1). Thirty-nine studies were excluded when applying the PICO inclusion criteria. Formally, only two studies19 20 met all PICO criteria. Therefore, we decided to secondarily include all studies meeting the criteria of Intervention, Comparison and Outcome but with an adapted criteria for Population. Thus, nine more studies with selected cohorts including patients <65 years of age (but with mean/median age ≥65 years or ≥50% of included patients ≥65 years of age) or including other mechanisms of injury than LEF (but LEF in ≥50% of patients) were considered for extraction.21–29 An overview of study characteristics and extracted data is provided in online supplemental table 2.

Figure 1
Figure 1

Flow diagram of studies identified and included according to PRISMA. PICO, Population, Intervention, Comparison, Outcome; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Study quality assessment

The overall methodological quality of the studies reviewed was ‘fair’, with median scoring of 7 (range 4–7) according to the Newcastle-Ottawa Scale Questionnaire16 (online supplemental table 3). A summary of the QUADAS assessment is provided in online supplemental table 4, including the risk of biases of the individual studies included (according to Westwood et al18). Figure 2 illustrates the proportion of studies rated as ‘yes’ (low risk of bias), ‘no’ (high risk of bias) or ‘unclear’ (unclear risk if bias) for each of the QUADAS items. All of the studies provided sufficient details on the appropriate reference standard. The majority of studies used an inappropriate spectrum of patients20–24 26–29 and failed to report sufficient details on diagnostic review bias,20–29 uninterpretable results,19–22 24–29 and withdrawals.19–22 24 25 27–29 Almost half of the studies failed to report sufficient details on differential verification bias20 21 24 26 29 and test execution bias20–22 24 29 for judgment whether these were avoided.

Figure 2
Figure 2

Proportion of studies rated as ‘low risk’. ‘high risk’ or ‘unclear risk’ for each of the QUADAS items for the 11 included studies for the diagnosis of fractures of the rib cage, thoracolumbar spine, and pelvic ring. QUADAS, Quality Assessment of Diagnostic Accuracy Studies; XR, plain radiography.

Heterogeneity statistics demonstrated a significant heterogeneity (degrees of freedom df: 8, Q: 101.1, I2: 92.1%, p<0.001) among the studies reporting on pelvic ring fractures20–27 29 and a considerable heterogeneity (df: 1, Q: 3.9, I2: 74.1%, p=0.049) among the two studies reporting on thoracolumbar spine fractures.20 28 There was no significant heterogeneity (df: 1, Q: 0.61, I2: 0%, p=0.44) among the two studies reporting on fractures of the rib cage.19 20 Publication bias among studies reporting on pelvic ring fractures was significant (intercept: −3.70, t: −3.81, p=0.007), based on Egger’s linear regression test.

Measures of diagnostic accuracy

Table 2 summarizes the assessment of diagnostic accuracy of XR for fracture detection in the included studies. Four studies reporting on pelvic ring and hip fractures21 22 24 29 only included patients with negative XR, whereof measures of diagnostic accuracy were not calculated. See Section ‘Results of individual studies’ for further description of the individual results.

Table 2
|
Measures of diagnostic accuracy (CI 95%) of XR for fracture detection in the respective body regions calculated for the included studies

Results of individual studies

Thorax and rib cage

Two recently published studies assessed the measures of diagnostic accuracy of XR of the chest/rib cage to detect rib fractures in older adults after LEF,19 20 defining chest CT as reference standard. In total, 398 patients were examined by chest XR and consecutive chest CT (including contrast-enhanced CT20), independently of the findings of the chest XR. Prevalence for rib fractures after LEF was reported between 3% (86 of 2839)20 and 29% (96 of 330) of patients.19 False-negative chest XR were reported in 17 of 68 (25%)20 and 56 of 330 (17%)19 of patients. The resulting sensitivities were 22.7%20 and 41.7%.19 PPVs were 71%20 and 100%,19 with no false-positive chest XR in the latter study. The LRs were calculated with 0.820 and 0.6.19 No differences in median hospital LOS (4 days2–7 vs. 4 days,2–8 p=0.92), intensive care unit (ICU) admission rate (23% vs. 27%, p=0.62), median ICU LOS (21–8 vs. 3,1–5 p=0.54) or mortality (10.3% vs. 7.3%, p=0.45) were found between patients with and without rib fractures.19 In addition, effective dose estimations (in millisievert (mSv)) were calculated for chest XR (median: 0.02 mSv; IQR: 0.01–0.02 mSv) and for chest CT (including contrast-enhanced examinations) (median: 3.57 mSv; IQR: 3.52–5.18).20

Thoracic and lumbar spine

Two studies analyzed the diagnostic accuracy of biplane XR examination of the thoracic20 28 and lumbar spine20 in consecutive patients with LEF20 and minor trauma and suspected thoracic spine injury on physical examination,28 with CT examinations of the respective regions set as reference standard. In total, 140 patients were examined by biplane thoracic spine XR and 76 patients by biplane lumbar spine XR, followed by CT examinations of the respective spine regions. Thoracic spine fractures were found in 60.7% (65 of 107)28 and 2.2% (62 of 2839)20 of investigated patients, and lumbar spine fractures were found in 2.5% (71 of 2839)20 of patients. False-negative XR of the thoracic spine was reported in 30.7% (33 of 107)28 and 36% (12 of 33)20 of patients, false-negative XR of the lumbar spine was reported in 25% (19 of 76) of patients.20 Sensitivities of thoracic spine XR were estimated at 49.2%28 and 40.0%,20 sensitivity of lumbar spine XR was estimated at 57.8%.20 Estimated specificities of thoracic spine XR ranged from 54.8% to 100% and 100% for lumbar spine XR. The estimated LR ranged from 0.620 to 0.928 for thoracic spine XR and was 0.4 in lumbar spine XR.20 Both studies further assessed radiation doses as dose length product (in milligray * centimeter, mGy cm), effective doses (mSv). According to this, CT imaging resulted in a 26-fold28 to 55-fold20 increment of radiation dose at the thoracic spine, and in a 13-fold20 increment of radiation dose at the lumbar spine.

Hip and pelvic ring

Nine studies analyzed the diagnostic performance of XR of the pelvic ring in a total of 1622 elderly patients, with LEF in the majority of patients. Four of these21 22 24 29 only included patients with negative XR of the pelvic ring for further CT examination, set as reference standard. In these studies, false-negative XRs of the pelvic ring were identified in 39 of 310 (12.6%),21 109 of 193 (56%),22 46 of 139 (33%)29 and 24 of 87 (28%)24 patients. Sensitivities of pelvic ring XR were estimated from 0% to 52% with considerable variability, accordingly, specificities were estimated from 67% to 100% (see table 2). The assessed LR ranged from 0.525 to 1.5.27 The estimated overall OR for a fracture of the pelvic ring detected by XR was 0.07 (CI 95% 0.03 to 0.16), however, these estimates should be interpreted with caution due to significant heterogeneity.

Analysis of effective dose estimations revealed for XR a median dose of 0.02 mSv (IQR: 0.02–0.03 mSv) and for CT a median dose of 3.16 mSv (IQR: 1.54–2.39 mSv),20 a 158-fold increment of radiation in this body region.

CT examination increased mainly the diagnosis of fractures of the dorsal pelvic ring, including sacral fractures.21 24 25 27 This yielded an increment of patients, where surgical stabilization was indicated.21 In this study, the median hospital LOS in patients treated surgically was reduced in patients who received primary CT examination (17 days (6–68) vs. 21.5 days (12–37)), with no differences in the average time to surgery (6.2±3.5 days vs. 6.8±7.1 days).21

Discussion

XR is currently the first-line diagnostic tool for detection of LEF-related injuries of the skeleton in older individuals presenting to the ED. XR findings are frequently equivocal, resulting in subsequent CT imaging for diagnostic assurance. To the best of our knowledge, this is the first systematic literature review aimed at assessing the diagnostic performance of XR in detecting skeletal injuries after LEF. Our search yielded relatively few observational, predominantly retrospective, studies. The studies included in our systematic analysis demonstrated considerable clinical and statistical heterogeneity, whereby performance of a meta-analysis was not feasible. The assessment of test performance characteristics of the individual studies demonstrated that the diagnostic accuracy of XR was only moderate to poor, depending on the skeletal regions under investigation. Estimated sensitivities were 52% or less, NPV ranged from 14% to 81%, and LR was 0.4, at best, indicating that a negative XR does not safely rule out fractures of the rib cage, thoracic or lumbar spine, and pelvic ring, with a currently unknown clinical relevance.

Four of the studies addressed this issue and reported about the clinical and surgical outcomes of the target population as secondary outcomes.19–21 24 An increased (more accurate) detection of posterior pelvic ring fractures led to an increase in surgical therapy, whereby, in these patients, early CT examination shortened the hospital LOS in patients treated surgically.21 However, when the treatment policy of pelvic ring fractures of an institution obviates surgical treatment, an increase in CT-detected posterior pelvic ring fractures did not influence the hospital admission rates and hospital LOS.24 Furthermore, an accurate diagnosis of rib fractures does not result in differences in hospital LOS, ICU admission rate or in-hospital mortality (7.3% without rib fractures vs. 10.3% with rib fractures), without adjustment for overall injury severity.19 The most comprehensive retrospective assessment of accurate fracture detection including the spine, rib cage, and pelvic ring demonstrated that the rate of surgical treatment and intervention was not different, if different imaging strategies (only XR, only CT, XR and CT) were compared.20 However, the retrospective design of all of these studies does not permit a conclusive determination of whether the accurate diagnosis of fractures significantly alters clinical or surgical outcomes. Current poor evidence demands future prospective randomized clinical trials, to assess whether a safe diagnosis of fractures in the older adults with LEF is beneficial for resource management (eg, ED LOS), clinical and surgical decision making, diagnostic and treatment costs, risk of radiation and, most importantly, patient-centered clinical outcomes.30

This systematic review has some strengths and limitations. Strengths of our study were a well-defined search protocol and comprehensive search strategy across multiple databases and strict adherence to the PRISMA guidelines. Furthermore, we focused only on studies that evaluated CT imaging as the gold standard for fracture diagnosis in the ED setting or within a short-term period after the initial fall incident. The major limitation of our study is the lack of available high-quality evidence on this subject. Our systematic database search did not retrieve randomized controlled trials or high-quality non-randomized trials, therefore, the evidence generated is considered weak, at best. By expanding the inclusion criterion ‘Population’ we were able to include studies with patients aged ≥55 years or all-comer populations with a majority (≥50%) of patients aged ≥65 years or all-trauma populations with a majority (≥50%) of patients who sustained an LEF. This strategy retrieved nine additional studies for review. Second, there was a significant heterogeneity between the studies, due to variations in study quality, end points and outcomes as well as different inclusion criteria and patient selection. Therefore, we deemed a meta-analysis to be not feasible. Third, applying QUADAS criteria for quality assessment, this revealed an overall high risk of bias of the studies and across the studies, mainly concerning patients’ spectrum, test execution, and diagnostic review performance. Finally, the reviewers who assessed study quality and risk of bias were not blinded to the authors’ names nor the institution in which the study was conducted nor to the journal in which the study was published. This approach could potentially lead to bias in scoring the methodological quality of the studies. Therefore, the results of this study should be interpreted with these shortcomings in mind.

Conclusion

In conclusion, we found that high-quality evidence on accurate imaging of fractures in older adults with LEF in the ED is missing. Evidence from available studies indicate that XR lacks accuracy for the diagnosis of fractures of the pelvic ring, thoracic and lumbar spine, and rib cage. High-quality randomized prospective trials are warranted to provide conclusive information about the utilization of first-line CT examination in patients with low-energy trauma and the clinical suspicion of fractures. Since the benefit of diagnosing every fracture in the ED is currently unknown, future trials should therefore consider patient-centered outcomes as well. Lastly, the benefits and the potential drawbacks of first-line CT imaging, such as overdiagnosis or incidental findings, leading to further downstream testing and even surgical interventions, should be evaluated.