Would R2-D2 Successfully Pass the American Board of Anesthesiology Oral Board Examination?
Alan Jay Schwartz, MD, MSEd, Jonathan M Tan, MD, MPH, MBI, Justin L Lockman, MD, MSEd
For much of my career, one of my favorite things to do was to teach an oral board review course to my fellows. Over a 10-12 week period, these 2-3 hour voluntary weekend morning classes took place over breakfast in the dining room of my home. I knew that as physicians we were terrific at answering multiple choice questions but with the exception of critical care fellows not so good at answering open ended, verbal questions in a logical, erudite way. My goal was to teach our fellows how to think and speak as consultant anesthesiologists.
One of my fondest memories of these classes involved my at the time 9 year old daughter, Suzy, who came storming into the dining room at one of the classes and threw a doughnut at one of the students who quite frankly was fumbling at answering a question. Suzy shouted: “what’s the matter with you people? Declare an emergency, call for help and work thru your ABCs!” She had no idea what this meant but over the years and having heard it so many times while I was working thru problems with the fellows knew the correct problem solving approach. Thus, today’s PAAD has special meaning to me. Myron Yaster MD
Original Article
Blacker SN, Chen F, Winecoff D, Antonio BL, Arora H, Hierlmeier BJ, Kacmar RM, Passannante AN, Plunkett AR, Zvara D, Cobb B, Doyal A, Rosenkrans D, Brown KB Jr, Gonzalez MA, Hood C, Pham TT, Lele AV, Hall L, Ali A, Isaak RS. An Exploratory Analysis of ChatGPT Compared to Human Performance With the Anesthesiology Oral Board Examination: Initial Insights and Implications. Anesth Analg. 2024 Sep 13. doi: 10.1213/ANE.0000000000006875. Epub ahead of print. PMID: 39269908.
“A long time ago in a galaxy far, far away.” – or more precisely, before the digital revolution – anesthesiology education relied heavily on soporific lectures and textbooks, with limited access to advanced technology. Fast forward to 2025, and we find ourselves propelled into an era shaped by computers and artificial intelligence (AI). According to Merriam-Webster, AI is defined as “…software designed to imitate aspects of intelligent human behavior [bold highlighting added by the authors of this PAAD]…”1 The Merriam-Webster Dictionary also reminds us that to imitate is, “…to have or assume the appearance of; simulate; resemble.”1 This raises important questions for anesthesiology education and assessment: How well can AI replicate the thinking and communication style of a diplomate of the American Board of Anesthesiology (ABA)? And could AI be leveraged as an educational tool to help learners prepare for the ABA’s Standardized Oral Examination (SOE)?
The rapid advancement and widespread presence of AI today calls for its thoughtful integration across all areas of anesthesiology,2 including medical education. Blacker and colleagues responded to this call with their “…Analysis of ChatGPT Compared to Human Performance With the Anesthesiology Oral Board Examination…”3. Nathan’s Infographic4 visually displays Blacker and colleague’s analysis.
The study evaluated the performance of the Chat Generative Pre-trained Transformer (ChatGPT) AI chatbot [OpenAI, San Francisco, CA] by comparing its responses to those of human responses to ABA SOE questions. It is essential to recognize the purpose of the SOE, which “…is designed to assess a candidate’s higher-level competencies, such as judgment, organization, adaptability to unexpected clinical changes, and presentation of information.”3,5
In other words, ABA SOE questions are not “right or wrong” answers like other ABA (multiple choice) examinations. Instead, they are measures of the processing, judgment and “…effective presentation of information.”5 by the examinees based on their knowledge. The present study aimed to determine whether ChatGPT can match or exceed the performance of actual humans in this area – knowing already that in purely fact-based questions ChatGPT is likely to be more accurate.
ABA SOE questions originating from a publicly available ABA website were used to assess the performance of both four anesthesiology fellows and ChatGPT. Of note, the questions were slightly modified in wording and format. Additionally, unlike the dynamic, adaptive approach of the actual SOE, all participants in the study – human and AI – were presented with a standardized sequence of questions.
Eight ABA applied examiners assessed the SOE human- and ChatGPT-generated answers. The answers from both groups were not probed for depth, to explore misguided answers, or to try to gently redirect the examinee with prompts – all techniques which are commonly used in real ABA SOE examinations. Scripting and sequencing the questions did not allow such flexibility and adaptability to answers, and so the interpretation of results must account for this difference.
There was no significant difference in median overall global module scores between groups. Interestingly, the examiners incorrectly suspected that 42% of the human participant responses were AI-generated. Examiners commented that ChatGPT answers, while producing relevant content, were often subjectively inferior to human answers due to their excessive length and lack of focus on the specific scenario priorities.
Why might the AI-generated SOE answers have been inferior to human responses? Recall that the ABA grants diplomate status to candidates who:
“…possess knowledge, judgment, adaptability, clinical skills, technical facility, and personal characteristics sufficient to carry out the entire scope of anesthesiology practice without supervision and without accommodation or with reasonable accommodation. An ABA diplomate must logically organize and effectively present rational diagnoses and appropriate treatment protocols to peers, patients and their families, and others involved in the medical community [bold highlighting added by the authors of this PAAD].”5
Given the current capabilities of ChatGPT, it is understandable why it may fall short of meeting the standards of the ABA SOE. Take, for example, R2-D2 from Star Wars – the iconic and endearing droid capable of delivering factual information with speed and precision, yet lacking human nuance, flexibility and adaptability. Similarly, while today’s AI excels in factual recall, there is still a struggle to demonstrate the dynamic judgement required by an expert anesthesiologist. This limitation should not, however, overshadow AI’s enormous potential. With continued refinement, domain-specific training, and integration of multimodal inputs, future iterations of AI could meaningfully contribute to both the preparation for and assessment of complex competencies in anesthesiology. It is also important to remember that ChatGPT was only released in November 2022 and the version used in the study was not AI trained specifically by the ABA or medical educators for any specific purpose.
It appears that Blacker and colleagues are striving for an application of AI that would be useful to both future examinees and examiners. Indeed, what a marvelous idea for examination practice! But until AI is capable of devising answers with reasoned judgment, expressing adaptability, and personal characteristics sufficient to carry out the entire scope of anesthesiology practice, communicated in a logical effectively organized manner, we suspect humans will continue to outperform AI. Furthermore, these traits will also be prerequisite if ChatGPT is ever to assess SOE responses – either in practice or in examination.
In other words, ChatGPT is incredible at producing intelligent sounding responses. But there remains a gap between these responses and actual intelligence – at least as assessed by the ABA SOE. The timeline for closing this gap remains uncertain. Much will depend on the level of prioritization, sustained investment, and the formation of multidisciplinary teams committed to advancing AI for professional use in medical education and anesthesiology.
Do you utilize ChatGPT or another AI modality in your resident and fellow teaching? What about for patient care? What factors affected your choice? Is there institutional support for the use of AI? Do we need to develop and invest in systematic training in AI for our specialty? Send you experiences to Myron at myasterster@gmail.com who will include them in a Friday Reader Response.
References
1. https://www.merriam-webster.com/dictionary/ai?src=search-dict-hed (accessed 06/17/2025)
2. Tan JM, Cannesson M, Feldman JM, Simpao AF, McGrath SP, Khanna AK, Beard JW, McGaffigan P, Cole DJ. Emerging Technology and the Future of Perioperative Care: Perspectives and Recommendations From the 2023 Stoelting Conference of the Anesthesia Patient Safety Foundation. Anesth Analg. 2025 Jul 1;141(1):139-151. doi: 10.1213/ANE.0000000000007540. Epub 2025 May 7. PMID: 40333433.
3. Blacker, S, Chen, F, Winecoff, D, Antonio, B, Arora, H, Hierlmeier, B, Isaak, R. An Exploratory Analysis of ChatGPT Compared to Human Performance With the Anesthesiology Oral Board Examination: Initial Insights and Implications. Anesthesia & Analgesia, 2025; 140 (6), 1253-1262. https://doi.org/10.1213/ANE.0000000000006875
4. Nathan N. Oral Boards: Humans vs. AI (Infographic). Anesth Analg, 2025; 140 (6): 1252. https://doi.org/10.1213/ANE. https://doi.org/10.1213/ANE.0000000000007554
5. Policy Book-2025, American Board of Anesthesiology. Page 5. https://www.theaba.org/wp-content/uploads/2025/02/2025-Policy-Book.pdf (accessed 06/17/2025)