Discussion
A well-designed CAD algorithm can potentially reduce medical errors and facilitate accurate diagnoses.25 30 However, there is currently a lack of generalized and comprehensive algorithms for interpreting chest radiography in the trauma domain. Although DL algorithms have shown promise in detecting abnormalities in radiographs, there is still a gap between developing scientifically sound algorithms and their practical implementation in real-world settings.31 In this study, we developed an algorithm based on a novel weak-supervised DL method that achieved high performance in identifying multitasks of trauma-related skeletal radiographic findings on CXRs to fit the clinical requirement. CXR-FxNet achieved an AUC of 91.2% in an independent data set and showed the ability to localize rib and clavicle fractures in CXR.
Accurate diagnosis is essential, as failure to do so could result in a bleak prognosis. The utilization of this algorithm presents an opportunity to make timely improvements in clinical performance and safety. DL has gained substantial traction in the medical field, however, the application of DL in trauma assessment is still somewhat limited in real-world clinical scenarios.32–34 DL algorithms in the medical field must exhibit performance comparable to that of physicians to generate meaningful clinical benefits.35 Current available applications were still focused on detecting skeletal fractures of the pelvis and extremities.28 36 37 Previous studies have demonstrated that algorithms can achieve similar performance to physicians in detecting various fractures on radiographs, including proximal humerus fractures,38 wrist fractures,25 and hip fractures.39 This highlights the potential of CXR-FxNet to assist in the identification of these fractures. Indeed, the use of the CXR-FxNet algorithm can provide real-time recommendations to front-line physicians as they manage multiple trauma patients in a chaotic emergency environment, where misdiagnoses can occur.40 Specifically, in the case of rib fractures, our algorithm has the capability to detect multiple rib and clavicle fractures as in figure 5. This feature proves particularly valuable in healthcare institutions that may lack access to consulting specialists or experienced medical staff.41 By providing timely and accurate insights, the algorithm can enhance the diagnostic capabilities of front-line physicians and contribute to improved patient care in such settings.
In contrast to extremity radiographs, CXR shows complex anatomy, with frequent multiple injury sites and pathologies. The soft tissue components such as mediastinum and foreign catheters ex chest tube might induce misdiagnosis. In the contemporary medical environment,28 developing separate algorithms for each type of anomaly present in a single image is not feasible. Consequently, there is a pressing need for universal solutions tailored to specific clinical scenarios in emergency CXR. Due to the complex anatomy, the development of DL is very rare in thoracic trauma. Most applications focus on chest CT algorithms for diagnosing rib fractures.12 19 42–48 Although the models based on chest CT exhibited commendable performance, there were still certain limitations. First, medical costs, availability, and radiation exposure considerations limit the widespread use of CT in trauma evaluations, as it is not typically employed as the primary survey tool in most parts of the world. Second, the considerable volume of images and data associated with CT poses challenges. When we are training the DL algorithm using CT images, the data amount can be tens to hundreds of times larger compared with CXR. Consequently, the complexity of the calculations, the high computational power requirements, and the difficulty of integrating into the medical examination process are the limitations that these algorithms cannot be used on a global scale. Unlike CT, CXR is much more readily available in any hospital and it was looked at as the primary modality for evaluating trauma patients. Here we have introduced CXR-FxNet, which can offer some advantages. First, the CXR-FxNet algorithm demonstrates the capability to accurately identify and localize various trauma-related abnormalities. Its ability to detect multiple categories of abnormalities simultaneously, across multiple locations within an image, enhances physicians' confidence in the algorithm and facilitates its widespread adoption in clinical practice. Second, CXR-FxNet used CXR instead of CT which helps reduce computational demands and standardize image quality. This approach allows for consistent diagnostic capabilities even in hospitals with limited medical information resources. In contrast to models relying on CT images, our DL model is more lightweight, accessible, and user-friendly, enabling a broader range of people to use it conveniently. For the institutes that can afford DL calculation server and PACS systems, the requirements and costs of information systems can decrease compared with high computation-requiring systems. For those unable to afford this additional equipment, the model can be set up on the cloud. We’ve also designed a website (website link:http://140.129.68.84:8081/) for easy online setup for public use and validation. The health providers can upload the CXR images taken with their cameras or mobile phones to the web and receive the DL model-assisted feedback within seconds. In this study, we also found an interesting result as previous research suggests that DL algorithms may be beneficial for younger and less experienced physicians.With the help of the DL algorithm, junior staff are able to locate fracture sites with performance comparable to that of experienced physicians.
The development of DL models in the medical field is often hindered by limited data size and the lack of clear labeling. The image-level label is relatively easy to acquire through medical records, but the detailed expert label on the image is excessively expensive. Weakly supervised methods have emerged as a potential solution, offering the ability to achieve a reasonably high baseline performance even with large but somewhat noisy data sets. In this study, we not only explored the use of weakly supervised methods relying solely on image-level information but also assessed the impact of incorporating bounding box annotated images on model performance. We tried the teacher-student knowledge distillation method in the current study to improve the model performance with few expert annotations. This evaluation aimed to analyze whether adding high-quality, detailed annotations could further enhance the model’s accuracy compared with relying solely on weakly supervised methods. As a result, we found that adding more detailed information to the model reduced the need for training images and yielded better results.
Limitations
In addition to achieving excellent performance in the detection of rib and clavicle fractures, our algorithm represents the first study to successfully develop an algorithm capable of detecting such fractures from CXR, to the best of our knowledge. However, it is important to acknowledge the limitations of this algorithm. The primary limitation stems from the scarcity of training data available. DL algorithms are data-driven and rely on large data sets to effectively address problems. Despite implementing a weak labeling algorithm, this limitation could not be entirely overcome. Due to time and cost constraints, radiologists were not used for image review and labeling. Two experienced trauma surgeons specializing in rib management undertook this task, with potential limitations in achieving standard labeling levels. No inter-rater reliability assessment is another limitation for data labeling for this study. Another limitation is the retrospective nature of this single-institute image review study. The population and image collection process were confined to a specific setting, potentially introducing biases that limit the direct applicability of our findings to other institutes with different population distributions. Moreover, the images were randomly selected based on the clinical diagnosis from the registry, so that the presence of selective bias cannot be completely excluded.
DL algorithms are often referred to as ‘black boxes’ because their primary function is to establish relationships between given data and outcomes. To address this issue, recent research has focused on interpretable DL techniques. In our study, we incorporated a visual heatmap highlighting areas of possible abnormality to aid doctors in understanding the algorithm’s decision-making process. However, it is important to note that in real-world scenarios, physicians make diagnoses by radiographic findings and by clinical information such as patient histories and physical examinations. The true benefit of this algorithm should be evaluated in a prospective randomized clinical trial, considering the comprehensive clinical environment.