Original Research

Training on a virtual reality cricothyroidotomy simulator improves skills and transfers to a simulated procedure

Abstract

Objective The virtual airway skills trainer (VAST) is a virtual reality simulator for training in cricothyroidotomy (CCT). The goal of the study is to test the effectiveness of training and transfer of skills of the VAST-CCT.

Methods Two groups, control (no training) and simulation (2 weeks of proficiency-based training), participated in this study. Subjects in the control condition did not receive any training on the task whereas those in the simulation received a proficiency-based training on the task during a period of 2 weeks. Two weeks post-training, both groups performed CCT on the TraumaMan to demonstrate the transfer of skills.

Results A total of (n=20) subjects participated in the study. The simulation group performed better than the control group at both the post-test (p<0.001) and retention test (p<0.001) on the simulator. The cumulative sum analysis showed that all subjects in the simulation group reached proficiency with acceptable failure rate within the 2 weeks of training. On the transfer test, the simulation group performed better on skin cut (p<0.001), intubation (p<0.001) and total score (p<0.001) than the control group.

Conclusions The VAST-CCT is effective in training and skills transfer for the CCT procedure.

Level of evidence Not applicable. Simulator validation study.

Introduction

Cricothyroidotomy (CCT) is an emergency life-saving procedure for patients who cannot be intubated by conventional means and would otherwise face impending death.1 Since CCTs are performed infrequently, with incidence of both prehospital and in-hospital CCTs requiring secure airway, ranging from 0% to 18.5%,1 training on live patients is limited. Currently, learning and maintaining the skill is done by practicing on cadavers, animal models, manikins or small benchtop models.2–12 Due to ethical issues in using animals, 99% of the advanced trauma life support (ATLS) courses in the USA and all of the courses in Canada now use the American College of Surgeons (ACS)-approved manikin-based simulators.13 The main limitations of manikin-based simulators are in providing a high-fidelity simulation with variations that are routinely seen in clinical cases.

To overcome these challenges, we developed the virtual airway skills trainer (VAST), a virtual reality simulator for training in airway procedures. The VAST-CCT, developed for training in the CCT procedure, is a validated simulator that has been previously shown to differentiate performance of surgeons with more than five actual cricothyrotomies.14

The goal of this work is to assess the ability of the VAST-CCT simulator to train novice subjects in the CCT procedure and transfer knowledge to a simulated procedure. It is hypothesized that subjects trained in the VAST-CCT procedure will demonstrate improvement in skill and positive transfer of learning in the simulated clinical environment compared with control group with no prescribed training.

Materials and methods

Study design and procedure

The study was a between-subjects design with medical school student volunteers randomized into two groups: control and simulation.

Subjects

Twenty (n=20) (13 men and 7 women) second year and third year Texas A&M College of Medicine students were recruited at the Baylor University Medical Center in Dallas for this study. The subjects were randomly assigned to either control (n=10) or simulation (n=10) groups. The gender distribution in our randomly assigned groups was as follows: control (male=6, female=4) and the simulation (male=7, female=3).

Pretest

To assess the baseline performance (pretest), both groups performed the task once on the VAST-CCT simulator at the beginning of the study. The subjects were first introduced to the VAST-CCT simulator before completing the task once for their baseline assessment.

Training

Subjects who were assigned to the simulation group then practiced the task one session a day, 5 days/week, for two consecutive weeks, for a total of 10 training sessions. At the beginning of the first training session, they were also asked to watch an online supplemental video of an expert performing the CCT on the simulator. During the training sessions, subjects were allowed unlimited attempts until they demonstrated proficiency-level performance for two consecutive repetitions. The subjects in the simulation group were monitored in person during the training session by the experimenters led by the study lead, who were trained in all aspects of this procedure to provide feedback by our trauma and critical care surgeons. After each trial, the subjects’ performance scores on landmark identification, skin and membrane incision and intubation were disclosed and feedback was provided (eg, showing proper landmark position, showing them the start-end position as well as direction of the incision and whether endotracheal tube is placed too deep or shallow). Further instruction was given to the subjects if they appeared to be struggling with the task. Each session was no longer than an hour. Subjects in the control group did not participate in the training sessions.

Proficiency-based training

A proficiency-based model15 was used to train the subjects in the simulation group. The learning rate and plateau levels of subjects vary within a group, and a preset or arbitrary end criterion may not ensure that all subjects reach the level of proficiency. However, proficiency criteria derived from expert performance have been successfully used in training subjects in surgical skills.15 16 For surgical motor tasks, the mean performance value of expert performance has been used as a proficiency metric.16 In our study, two expert board-certified trauma surgeons were asked to perform the CCT task five times on the VAST-CCT after becoming familiarized on the simulator. The automatically computed score for the five consecutive repetitions was then used to calculate the proficiency score for the simulation group. Since our subject population are medical students who are completely novice to this procedure, we were interested in assessing the elevation of their performance from novice level and kept the proficiency score as mean—SD to be an achievable target. Demonstration of performance at this proficiency level for two or more consecutive repetitions was marked as the endpoint for each day of training.

Post-test and retention test

At the end of the 10 training sessions, all subjects performed the task once on the VAST-CCT simulator (post-test). Two weeks after the post-test, all subjects performed the task again to determine their skill retention (retention test). The washout period of 2 weeks was chosen based on our previous surgical skills training with medical students.17 18

Transfer test

Immediately after the retention test, subjects performed CCT on a TraumaMan (Simulab, Seattle, Washington) manikin to demonstrate their transfer skills. We chose TraumaMan because it is an ACS-approved training platform for the ATLS course. Transfer of skills to the clinical environment is desirable but risky as our subject population are medical students and cannot yet operate on human patients. Cadavers have limitations of being expensive, single use, and without the same feel as live tissue. Each subject repeated the task twice on the manikin and the entire procedure was videotaped. The completion time, skin and membrane cut lengths, and intubation length were recorded and a score was computed using the same procedure in our simulator. Additionally, two expert trauma and critical care surgeons rated the performance of the subjects from the recorded videos. The videos were double blinded (both groups and repetitions). We derived a subjective assessment tool based on the Objective Structured Assessment of Technical Skills19 using which the experts rated the performance of the subjects on a 5-point Likert scale for preparation, respect for tissue, time and motion, instrument handling, knowledge of procedure, and overall performance. The experts also rated the performance of each of the subjects using a custom checklist as the available validated tool was for rapid four-step technique.20 The checklist had seven items that were scored either 0 (does not perform the task), 1 (performs the task inadequately) or 2 (performs the task adequately).

Simulator

The VAST-CCT is a validated simulator that was developed for training in critical airway skills.14 It consists of a specialized palpation interface for landmark identification and a haptic feedback device for force feedback while performing the task (figure 1). A computer monitor is used to display the virtual anatomic models and the tools used in the procedure. The simulator is designed to train in four key tasks of the CCT procedure: (1) landmark identification, (2) skin incision, (3) cricothyroid membrane incision, and (4) intubation. During the landmark identification task, the users manually palpate the interface to identify the cricoid and thyroid cartilages. An array of force sensors mounted underneath the palpation interface is used to triangulate the position of the finger for display on the monitor. Once the landmarks are identified, users can hold on to the handle of the haptic device (Geomagic Touch, 3D Systems, Rock Hill, South Carolina) to choose appropriate tools to complete the other three tasks. The simulator automatically records performance metrics for assessment. A more detailed description of the simulation of the procedure in the VAST-CCT is described in our construct validation article—the VAST-CCT simulator.14

Figure 1
Figure 1

The virtual airway skills trainer for cricothyroidotomy (VAST-CCT) simulator showing the visual and haptic interface.

Statistical analysis

The performance at pretest, post-test, and retention test results on the VAST-CCT simulator was analyzed using a separate two-way mixed factorial analysis of variance (ANOVA) with ‘Test’ (pre, post, and retention) as the within-subjects factor and the ‘Group’ (control and simulation) as the between-subjects factor for both completion time and total score. Interaction between the two (Test*Group) was also analyzed. For further analyzing the significance a simple effects test with Bonferroni correction was used. The cumulative sum (CUSUM) method was used to analyze the learning of the simulation group during the training.21

The CUSUM is a useful statistical tool for quality control of sequential process.21 The CUSUM chart is plotted by first setting the acceptable (po) and unacceptable failure rates (p1), probability of type 1 (α) and type II (β) errors. Based on these values, the upper (h1) and lower (h0) decision limits and a constant s were calculated. At each trial, if an outcome is a success, the constant s is subtracted from the previous CUSUM. If the outcome is failure then 1-s is added to the previous CUSUM. When successive trials are successful the CUSUM score will decrease (negative slope) whereas in the case of failure it will increase (positive slope), which can be visualized by plotting the CUSUM. By plotting the CUSUM over time, the time at which the process reaches acceptable failure rate can be determined. It has been widely used for the analysis of the learning curve in many medical procedures.17 18 22–31

The experts’ ratings on the transfer tests were tested for inter-rater reliability by calculating the Cronbach’s alpha and then analyzed using a non-parametric Mann-Whitney U test for significance between the groups.

Results

Pretest, post-test and retention test results

The subject’s performance on the VAST-CCT, including completion time (figure 2A,B) and total score (figure 2C,D), for both the control and simulation groups, is shown in figure 2. The results from the two-way mixed factorial ANOVA are summarized in table 1.

Figure 2
Figure 2

Performance at pretest, post-test and retention test for (A) completion time—control group, (B) completion time—simulation group, (C) total score—control group, (D) total score—simulation group.

Table 1
|
Two-way mixed ANOVA results for completion time and total score

Analysis of the completion time showed a significant main effect of test (F(2,36)=38.63, p<0.001, ɳ2=0.682) and interaction (Test*Group) (F(2,36)=12.36, p<0.001, ɳ2=0.407). This shows that there was a significant change in completion time between the pretest, post-test, and retention test. There was also a significant difference in performance between the groups (F(1,18)=9.25, p=0.007, ɳ2=0.34). To analyze the significant results in completion time further, a simple effects test with Bonferroni correction for multiple comparisons was computed. At pretest, both groups’ baseline completion time was not significantly different (162.7 vs. 193.2, p=0.32), but the simulation group completed the task faster than the control group at the post-test (128.8 vs. 42.1, p<0.001) and retention test (112.1 vs. 47.3, p<0.001). The control group’s mean completion time did not show significant differences between pretest and post-test (162.7 vs. 128.8, p=0.42), pretest and retention test (162.7 vs. 112.1, p=0.064) and post-test and retention test (128.8 vs. 112.1, p=0.122) whereas the simulation group showed improvement in completion time between pretest and post-test (193.2 vs. 42.1, p<0.001), pretest and retention test (193.2 vs. 47.3, p<0.001) and skill retention between post-test and retention test (42.1 vs. 47.3, p=1.0).

For the total score, the main effect of test was not significant (F(2,36)=0.89, p=0.418, ɳ2=0.047) but there was a significant interaction (Test*Group) (F(2,36)=898.2, p<0.001, ɳ2=0.588). There was also a significant difference in total score between the two groups (F(1,18)=1269.6, p<0.001, ɳ2=0.556). Test of simple effects showed that at pretest there was no significant difference in total score between the control and simulation groups (28.0 vs. 22.2, p=0.1). During the post-test, the simulation group performed significantly better than the control group as expected (17.4 vs. 37.4, p<0.001) and during the retention test, the simulation group continued to perform better than the control group indicating skill retention (20.4 vs. 33.8, p<0.001). As the main effect was insignificant, no further simple effects tests were conducted.

Learning curve analysis

Proficiency score

The average score from the experts’ (n=2) performance on the VAST-CCT was 34±4. The proficiency score for our study was set as 1 SD below the mean: 30 (34−4).

CUSUM analysis

For the analysis of the CCT data, with the proficiency score of 30, acceptable failure rate po of 20% (0.2), p1=0.4, α=0.5, β=0.2, the upper and lower decision limits and the constant s, computed were h0=−1.58, h1=2.82 and s=0.3, respectively.

Figure 3 shows the CUSUM plot of all subjects in the simulation. Based on the proficiency score of 30, all subjects (MS24 at the 9th trial, MS14 at the 12th trial, MS18 at the 13th trial, MS12 at the 23rd trial, MS20 at the 27th trial, MS1 and MS19 at the 33rd trial and MS7, MS8 and MS9 at the 67th trial) reached the level of acceptable 20% failure rate indicated by their individual CUSUM score crossing the lower decision limit (h0).

Figure 3
Figure 3

Cumulative sum (CUSUM) chart of the individual subjects in the simulation group. (Note: subjects were given an identifier based on randomization order generated for a total of 25 subjects to account for attrition.)

Transfer test results

Figure 4 shows a subject performing the transfer test on the TraumaMan manikin. The video of a subject performing the transfer task is provided as an online supplemental material 1. In all of our analyses of the transfer task, we excluded two subjects in the control group who had to perform the transfer task on a SimMan manikin due to unavailability of the TraumaMan manikin at the time of the study.

Figure 4
Figure 4

Subject performing the cricothyroidotomy (CCT) on the TraumaMan manikin.

Analysis of the computed score

The metrics computed from the data, landmark identification, skin cut score, membrane cut score, intubation score, total score, and completion time were analyzed using a Mann-Whitney U test to assess the difference in performance between the control and the simulation group. The results from skin cut score (figure 5A) (U=54, p<0.001), intubation (figure 5B) (U=6, p<0.001) and total score (figure 5C) (U=6.5, p<0.001) showed that the simulation group performed better than the control group. The landmark (U=144, p=0.78), membrane cut score (U=135, p=0.4) and completion time (U=135, p=0.4) did not show a significant difference.

Figure 5
Figure 5

Computed metrics from the transfer test: (A) skin cut score, (B) intubation, (C) total score.

Expert rating assessment

The ratings of the two experts both on the global rating scale of performance (Cronbach’s alpha=0.840 and intraclass correlation coefficient (ICC)=0.724, p<0.0001) and task-specific checklist (Cronbach’s alpha=0.988 and ICC=0.976, p<0.0001) showed a high inter-rater agreement.

The Mann-Whitney U test results from the subject rating on the global rating scale showed none of the items in the rating scale were significant between the two groups: (1) preparation for procedure (U=726, p=0.5), (2) respect for tissue (U=717, p=0.6), (3) time and motion (U=707, p=0.6), (4) instrument handling (U=683.5, p=0.4), (5) flow of procedure (U=747, p=0.8), (6) knowledge of procedure (U=738.5, p=0.8), (7) overall performance (U=636.5, p=0.1), and (8) total score (U=745, p=0.8).

The Mann-Whitney U test results for the checklist score were: (1) identifies thyroid cartilage (U=720, p=0.14), (2) identifies cricoid cartilage (U=760, p=1.0), (3) makes a 4–6 cm skin incision between the cartilages (U=714, p=0.5), (4) dilates the skin incision (U=752, p=0.8), (5) makes an incision along the cricothyroid membrane (U=680, p=0.03), (6) dilates the incision in the cricothyroid membrane (U=554, p=0.002), (7) intubation (U=668, p=0.2), and (8) total (U=702, p=0.5). Both checklist item 5 ‘Makes an incision along the cricothyroid membrane’ and item 6 ‘Dilates the incision in the cricothyroid membrane’ showed simulation group performance was better than the control group.

Discussion

Cricothyrotomies are one of the most infrequently performed procedures, yet they have one of the highest potentials to resuscitate a patient when all other attempts have failed. The availability of real clinical training for this procedure is rare due to early detection of a compromised airway. In a level 1 trauma center of 2741 trauma admissions (between April 2010 and February 2012), only four cricothyrotomies were performed among them for an incidence rate of 0.15%.32 While this is excellent for patient care, it creates a challenging situation to appropriately train physicians for a situation in which they have to perform a cricothyrotomy. Therefore, simulation-based training is critical for learning and maintaining emergency airway management skills including CCT.33–35

Manikin-based training (SimMan (Laerdal), TraumaMan (Simulab), etc) needs replacement of the skin after one or two attempts and cannot reproduce difficult airway situations. Cadavers can be useful for training in CCT as they can naturally present differences in anatomy unlike a manikin; however, they can be very expensive. It has been shown that cadaver-based training is superior in fidelity to training on manikins.12 Moreover, all of these simulators also require a proctor to record data and score the performance.

The VAST-CCT simulator does not need replenishment of materials nor a proctor to time and score the performance as it is done automatically by the computer program. In our study, we established that the VAST-CCT can be effectively used to train the necessary skills to successfully perform the CCT procedure. Of the 20 medical students, the subjects randomized into the simulation group showed a significant decrease in completion time between pretest and post-test compared with the control group. Similarly, their scores also increased considerably between pretest and post-test indicating gain in proficiency. The control group scores were higher than the simulation group at the pretest because we conducted the baseline assessment after randomization and, coincidentally, the median performance score was higher than the simulation group at the pretest. At both post-test and retention test, the control group’s performance decreased, indicating no significant learning of the CCT skills. These results are in line with expected gain in skill and proficiency through longitudinal training of novice subjects using a virtual reality simulator that has been observed in simulators developed to teach basic laparoscopic surgical tasks.17 18 In our study, by training the simulation group using a proficiency benchmark, the total number of trials that need to achieve proficiency in this task was reduced as opposed to a fixed number of trials (eg, 10 trials per day to a total of 100 trials).

The CUSUM analysis showed that the individual learning rate and the number of trials to reach proficiency varied significantly with one subject (MS24) which started to perform consistently above the proficiency criteria by the ninth trial as opposed to two subjects (MS8 and MS9) who took 67 trials to reach the consistency at the proficiency level in their performance. Since we trained the simulation group for only 10 sessions during 2 weeks, it is not possible to say whether these two subjects’ performance continued this trend of consistent performance; nevertheless, within 2 weeks, all subjects performed at the set proficiency level.

To test the transfer of skills, we decided to use the TraumaMan (Simulab) as the standardized platform due to medical students not yet qualified to operate on a real patient and the convenience of providing similar anatomy and conditions for all the subjects as opposed to a cadaver or benchtop models. We replaced the neck, skin, and the cricothyroid membrane for each subject, which enabled us to measure the performance metrics for each subject. The simulation group’s performance from the computed metrics was significantly higher than the control group for skin cut, intubation, and the total score, clearly demonstrating their proficiency in performing the CCT procedure. We speculate the reason there was no difference in landmark identification and membrane cut score was because the TraumaMan had prominent thyroid and cricoid cartilages that were not very difficult to identify and both groups had difficulty cutting the membrane as it was attached to the manikin using two sticky pads, which did not provide sufficient tension for cutting. Moreover, completion time between groups did not show significant results. We think it is due to the difficulty both groups faced with cutting the skin as well as familiarization with the manikin.

We also assessed the performance using subjective ratings by two experts from the recorded videos. Our assessment tool showed excellent consistency and inter-rater reliability; however, the Likert scale assessment items were not useful in differentiating performance between the two groups. Since the transfer test was done at the end after the retention test, even the control group subjects had the opportunity to perform the task thrice on the simulator to understand the steps. Additionally, with the standard medical student training, they had baseline knowledge on how to use a scalpel and dilators. It underscores the need for development of assessment tools specific to CCT. Our task-specific checklist items did show difference in performance between the groups for making the incision along the cricothyroid membrane and dilation of the incision. These are important steps that the simulation group had practiced several times and it was not surprising that the simulation group’s performance was superior. Due to the limitation of the manikin, as well as the necessity to score from recorded videos, other checklist items did not show significant results. As part of our future work, we will investigate on developing proper assessment tools for scoring the performance on the manikin that can then be used for simulation using cadavers or benchtop models.

In conclusion, our results demonstrate that the subjects in the simulation group had significant improvements in performance compared with the control group. Our results are due to both the efficacy of the VAST-CCT simulation trainer as well as the longitudinal training that subjects received. Others have had success with developing comparable models to today’s training standard and have shown to perform similarly with no clear statistical difference between the groups.10 We think that the VAST-CCT will be a useful tool for teaching CCT to a novice trainee as well as a platform to maintain skills for experienced practitioners. A further study on subjects trained on the simulator and their performance on a real patient is needed to show predictive validity on all aspects of the simulator.