OpenAI's recent advancement with its large language model (LLM) ChatGPT-4.0 has demonstrated remarkable progress, achieving an 85% accuracy rate in a clinical neurology exam during a proof-of-concept study. This study, conducted by researchers from the University Hospital Heidelberg and the German Cancer Research Center Heidelberg, was published on December 7. It involved assessing two versions of the model: ChatGPT-3.5 and the upgraded ChatGPT-4.0.
The exam was based on questions from the American Board of Psychiatry and Neurology's neurology exam question bank, along with a smaller set from the European Board of Neurology. ChatGPT-4.0 notably outperformed its predecessor, scoring 85% accuracy by correctly answering 1,662 out of 1,956 questions. In comparison, ChatGPT-3.5 scored 66.8%, responding accurately to 1,306 questions. Typically, a 70% accuracy rate is considered passing in educational institutions, and ChatGPT-4.0 surpassed the average human score of 73.8%.
The model demonstrated proficiency in answering queries related to behavioral, cognitive, and psychological aspects, successfully passing the neurological exam. However, both iterations of the model exhibited relative weaknesses in tasks requiring "higher-order thinking" compared to those involving "lower-order thinking."
Despite this success, the researchers caution against relying entirely on large language models in clinical neurology. While they highlight the potential for their application in documentation and decision support systems, they emphasize the need for further refinement. Dr. Varun Venkataramani, one of the study's authors, clarified that the LLM still requires development and possibly specific fine-tuning to ensure its suitable application in clinical neurology.
This study serves as a proof-of-concept, showcasing the capabilities of LLMs in neurology. While AI is already contributing significantly to healthcare tasks like cancer treatment research and combating antibiotic overprescription, the research underlines the imperative for continued advancements and refinement in leveraging AI for higher-order cognitive tasks in the medical field.


















