Injury

Performance of artificial intelligence in addressing questions regarding management of clavicle fractures

Injury. 2026 Jan 22;57(3):113053. doi: 10.1016/j.injury.2026.113053. Online ahead of print.

ABSTRACT

OBJECTIVES: Artificial intelligence (AI) has revolutionized public access to extensive information with large language model (LLM)-based chatbots allowing users to receive comprehensive, individualized responses. In this study, we aimed to evaluate the quality of LLM responses to questions about common orthopedic conditions. We hypothesized that both ChatGPT and Gemini would demonstrate high quality, evidence-based responses across evaluation criteria.

METHODS: Responses from ChatGPT and Gemini to prompts based on the 14 AAOS Clinical Practice Guidelines for clavicle fracture management were evaluated on six criteria by seven fellowship-trained shoulder and trauma orthopedic surgeons. Statistical analyses including mean scoring, standard deviation and two-sided t-tests were calculated to compare performance between ChatGPT and Gemini. Scores were then evaluated for inter-rater reliability (IRR).

RESULTS: ChatGPT and Gemini demonstrated overall mean scores greater than 3.5 for both platforms. Mean overall score for ChatGPT was highest in evidence-based (4.52 ± 0.16) and lowest in clarity (4.22 ± 0.19). Mean overall score for Gemini was highest in clarity (4.31 ± 0.17) and lowest in evidence-based (3.81 ± 0.22). ChatGPT had significantly better performance in the overall completeness category (4.50 ± 0.17 vs 4.11 ± 0.19, p < 0.005) than Gemini but scores were otherwise not significantly different. Over 70 % of respondents rated the responses of ChatGPT as higher quality than Gemini.

CONCLUSIONS: ChatGPT and Gemini produced responses that were generally in line with the 2022 AAOS guidelines on the treatment of clavicle fractures. Scores were comparable in every overall category except completeness, with ChatGPT outperforming Gemini. These results suggest that both LLMs are capable of providing clinically relevant responses to questions related to clavicle fracture management.

PMID:41621222 | DOI:10.1016/j.injury.2026.113053

Pilot validation study for a large image database of proximal femur fracture anteroposterior radiographs: Searching for the ground truth

Injury. 2026 Jan 22;57(3):113056. doi: 10.1016/j.injury.2026.113056. Online ahead of print.

ABSTRACT

PURPOSE: This pilot study aims to validate the "ground truth" accuracy and consistency of proximal femur fracture classification using a large radiographic image database. The project, a collaboration between expert groups from the University of Turin and the AO Foundation, seeks to ensure that expert consensus-based annotations are reliable for future artificial intelligence (AI) model development.

METHODS: A cross-sectional, diagnostic accuracy study was conducted using a randomly selected subset of 300 anteroposterior pelvic radiographs from a single-center image repository created at the University of Turin within the AO Innovation Translation Center framework. Fracture classification annotations were independently provided by the local clinical expert group (LC-EG) and by an independent AO expert group of surgeons (AO-EG). To assess interrater reliability between the two groups, Cohen's kappa coefficient was calculated for categorical agreement on the presence of a fracture and AO/OTA classification.

RESULTS: The comparison of annotations from LC-EG and AO-EG yielded a Cohen's kappa of 0.81 (95 % confidence interval: 0.75-0.87) and a percentage agreement of 87.67 % (95 % confidence interval: 87.63-87.70) for the classification of proximal femur fractures into three defined categories: no fracture, fracture type 31A, and fracture type 31B. These results confirm a high level of consistency between the two expert groups in annotating the image dataset.

CONCLUSION: The observed interrater reliability between the LC-EG and AO-EG supports the credibility of the reference annotations, establishing a validated ground truth for proximal femur fractures. This evidence justifies using the radiographic image database as a benchmark for future studies and as a foundation for transparent, reproducible AI development and evaluation, thereby facilitating safer integration of decision support tools into orthopedic trauma workflows.

PMID:41616725 | DOI:10.1016/j.injury.2026.113056

Trends in geriatric ankle fractures in the United States: An 8-year analysis

Injury. 2026 Jan 22;57(3):113066. doi: 10.1016/j.injury.2026.113066. Online ahead of print.

ABSTRACT

INTRODUCTION: Ankle fractures are among the most common fractures in older adults, associated with substantial morbidity and healthcare burden. This study aimed to evaluate recent trends in incidence and injury characteristics of ankle fractures among adults aged ≥65 years presenting to United States emergency departments.

METHODS: The National Electronic Injury Surveillance System (NEISS) database was queried for ankle fractures in adults aged ≥65 years from 2016 to 2023. Demographics, injury mechanisms, fracture types, and hospitalization rates were analyzed. Annual incidence rates per 100,000 persons were calculated. Trends over time, as well as age- and sex-specific differences, were analyzed.

RESULTS: An estimated 241,449 ankle fractures occurred among adults aged ≥65 years between 2016 and 2023, with an overall incidence rate of 55.8 per 100,000 person-years. The incidence increased from 49.1 to 63.0 per 100,000 persons during the study period (P < 0.0001). Incidence rates increased significantly in both males (from 25.7 to 34.7 per 100,000 persons; P < 0.0001) and females (from 67.7 to 86.4 per 100,000 persons; P < 0.0001). Most fractures occurred in women (76.2 %), resulted from low-energy trauma (92.8 %), and were closed fractures (96.9 %). Open fracture incidence rose from 0.64 to 2.40 per 100,000 persons, representing a 275 % increase (P < 0.0001). Hospitalization rates increased from 20.3 to 29.7 per 100,000 persons (P < 0.0001). Women aged ≥80 years accounted for the highest fracture burden. Women were more likely to sustain low-energy injuries (P < 0.0001), while men had a higher proportion of open fractures (P = 0.011). Hospitalization rates increased with age, reaching 56.6 % among patients aged ≥80 years (P < 0.0001).

CONCLUSIONS: Ankle fracture incidence among older adults in the U.S. increased significantly from 2016 to 2023, with rising rates in both males and females. Low-energy mechanisms remain the predominant cause in this population. Further studies are needed to identify optimal surgical treatments and rehabilitation strategies. Improving bone health and reducing morbidity and mortality remain key priorities in managing geriatric ankle fractures.

PMID:41616724 | DOI:10.1016/j.injury.2026.113066

Attempted definitive revision amputations in emergency department vs operating room for traumatic finger injuries are associated with a high rate of revision surgery

Injury. 2026 Jan 22;57(3):113067. doi: 10.1016/j.injury.2026.113067. Online ahead of print.

ABSTRACT

BACKGROUND: Revision amputation is a common treatment in the emergency department (ED) for traumatic finger injuries, yet there is limited data on outcomes for procedures completed in the emergency room versus the operating room. This study aims to assess outcome differences between ED revision amputation and delayed OR management.

METHODS: 103 consecutive patients with traumatic finger(s) amputations were identified from a single tertiary care center. Patients were evaluated by the on-call hand team and staffed with a fellowship-trained hand attending. ED revision amputations were performed with the goal of definitive care. Data was collected for injury/patient demographics, follow-up, and further revision procedures. Odds ratios were calculated to assess for predictive factors for ED management failure.

RESULTS: 55 patients were treated with ED revision amputation, 18 of whom (32.7 %) required further surgical management. Presence of multiple digit amputations was associated with increased initial treatment in the operating room. The most common indication for surgery was revision amputation and soft tissue coverage (88.9 %), followed by additional bony fixation for underlying fractures (44.4 %). Number of fingers amputated, fracture presence, and significant soft tissue injury were not associated with failure. Of the 48 patients with planned delayed management in the OR, 11 were treated with nonoperative wound care.

CONCLUSIONS: Definitive ED revision amputation was associated with a high rate of failure, need for revision surgery, and loss to follow up. Injuries with complex wound coverage or bony fixation may be better suited to OR management. Some patients may ultimately be appropriate for management without revision amputation and may be overtreated with this procedure in the ED.

PMID:41616723 | DOI:10.1016/j.injury.2026.113067

Feasibility and discriminatory properties of a simple fitness-to-drive assessment using a driving simulator placed in an orthopaedic outpatient department: a feasibility study

Injury. 2026 Jan 29;57(3):113032. doi: 10.1016/j.injury.2026.113032. Online ahead of print.

ABSTRACT

INTRODUCTION: Safe return to driving after orthopaedic injury or surgery is important, but standardised and feasible in-hospital assessments are lacking. We evaluated the feasibility of a simple simulator-based fitness-to-drive assessment in an orthopaedic outpatient department and its ability to discriminate between orthopaedic patients and professional drivers.

METHODS: In this prospective feasibility study (January 2024-January 2025), two identical driving simulators were installed in an orthopaedic outpatient department and a vocational training centre for professional drivers. Participants were ≥18 years, held a driving licence, and had no medical driving ban. All completed a 3-lap, 6-event scenario with predefined speed progression (50/60/70 km/h). Outcomes were completion, errors, speed progression, maximum reaction time and braking length (metres) at 50 km/h, simulator sickness, perceived realism, and subgroup test-retest reliability.

RESULTS: We included 57 patients and 92 drivers. Overall completion was 96.6% (144/149); 31.2% achieved speed progression. Patients were older, more often female, and more functionally impaired than drivers. Drivers had a shorter braking distance (23.3 m; 95% CI 22.1-24.5) and faster reaction time (0.5 s; 95% CI 0.5-0.6) than patients (39.4 m; 95% CI 36.7-42.1 and 1.2 s; 95% CI 1.0-1.4). Simulator sickness leading to discontinuation occurred in 3.4%. Most patients (98.2%) and 64.0% of drivers perceived simulator driving as comparable to real driving. Repeat testing showed a shorter braking distance, particularly in patients.

CONCLUSION: The simulated assessment was feasible, well tolerated, and discriminated between patients and professional drivers. Variation indicates a need for individualised assessment. Validation against on-road driving is required before clinical implementation.

PMID:41616722 | DOI:10.1016/j.injury.2026.113032

Association of area-level income with patient reported long-term disability outcomes post-traumatic brain injury

Injury. 2026 Jan 22:113064. doi: 10.1016/j.injury.2026.113064. Online ahead of print.

ABSTRACT

BACKGROUND: Traumatic Brain Injury (TBI) affects 64-74 million people annually, often causing long-term disability. The influence of social determinants of health (SDOH), particularly neighborhood and built environments, on functional outcomes post-TBI remains underexplored. This study examines the association between census tract-level median household income- a proxy for area income- and self-reported functional outcomes in TBI-patients seen in a Southern California TBI clinic.

METHODS: A retrospective cohort study of Neurology TBI & Concussion Clinic data (9/2022-1/2025) included patients ≥18 years with a known TBI mechanism and neurological symptoms who completed SDOH and functional assessments. SDOH factors included sex, race, ethnicity, insurance status, and median area income, determined by ZIP code using 2023 US census data. Disability was defined as Glasgow Outcome Scale-Extended-score ≤6 at index clinic-visit. Multivariable logistic regression was performed.

RESULTS: Among 148 patients (median age 46.5 years; 41% female, 75% mild TBI), the disabled cohort had higher proportions of poor insurance status (38% vs. 8%, p < 0.001), greater injury severity score (ISS) (9.0 vs. 1.0, p = 0.002), and lower median household income ($104,981 vs. $114,747, p = 0.020). Regression analysis showed poor insurance status (OR 5.80, CI 2.01-21.24, p = 0.003) and ISS (OR 1.06, CI 1.01-1.12, p = 0.027) predicted disability, but area income did not (OR 0.93, CI 0.79-1.10, p = 0.387).

CONCLUSION: Lower area income was associated with disability in unadjusted analysis but was not an independent predictor after adjusting for insurance and ISS. Findings highlight the need to explore individual and community factors influencing long-term TBI outcomes for targeted screening.

PMID:41605747 | DOI:10.1016/j.injury.2026.113064

Construction and validation of a machine learning model based on clinical indicators: Risk of bloodstream infections in patients with deep second- and third-degree burns

Injury. 2026 Jan 11;57(3):113046. doi: 10.1016/j.injury.2026.113046. Online ahead of print.

ABSTRACT

OBJECTIVE: Patients with deep second- and third-degree burns are at high risk of bloodstream infections (BSIs) due to skin barrier disruption and immune suppression, with poor prognosis. Early risk identification is crucial for improving outcomes. This study aimed to construct and validate a machine learning model using multidimensional clinical indicators to accurately predict BSI risk in such patients.

METHODS: A retrospective cohort study enrolled 301 patients with deep second- and third-degree burns (75 with BSIs) from Yongchuan Hospital Affiliated to Chongqing Medical University between January 2020 and January 2025. Multidimensional data on burn characteristics, laboratory indicators, and therapeutic measures were collected within 72 h of admission. After data preprocessing and feature screening, four models were built: logistic regression (LR), support vector machine (SVM), naive Bayes (NB), and back propagation artificial neural network (BP-ANN). Model performance was evaluated via stratified sampling and 5-fold cross-validation.

RESULTS: Eight key predictors were identified: total body surface area, lymphocytes (LYM, most important), platelet crit, total bilirubin, creatinine, C-reactive protein, procalcitonin, and 24-hour rehydration. The BP-ANN model performed best in the test set, with accuracy, recall, precision, F1 value, and AUC all reaching 0.857, good calibration (Hosmer-Lemeshow test, P = 0.142), and significant net benefit in the 0-0.3 risk threshold interval (decision curve analysis). The LR model had an AUC of 0.891 and high generalization stability (0.999) but less balanced indicators. SVM was overfitted (limited practical value), and NB had insufficient generalization (test set AUC=0.775).

CONCLUSION: The BP-ANN model based on multidimensional clinical indicators accurately predicts BSI risk in patients with deep second- and third-degree burns, with good differentiation, calibration, and clinical utility, providing a reliable tool for early intervention.

PMID:41604758 | DOI:10.1016/j.injury.2026.113046

Pages