Researchers at Stanford University and the University of California, Berkeley conducted a study on OpenAI's AI chatbot, ChatGPT, and found that its latest model has been experiencing a decline in its ability to provide accurate answers over several months. The study compared the performance of ChatGPT -3.5 and ChatGPT-4 models in solving mathematical problems, answering sensitive questions, writing code, and performing spatial reasoning tasks. Surprisingly, ChatGPT-4's accuracy in identifying prime numbers dropped from 97.6% in March to a mere 2.4% in June.
In contrast, earlier versions of GPT-3.5 models showed improvements in prime number identification during the same period. Furthermore, both ChatGPT models experienced a substantial deterioration in their ability to generate new lines of code between March and June. The researchers also observed changes in The chatbot's responses to sensitive questions, where the latest models became more succinct in declining to answer, with some examples showing a focus on race and gender.
The researchers highlighted the need for continuous monitoring of the quality of AI models, as the behavior of the "same" large language model service can change dramatically in a relatively short period. To ensure that chatbots remain up to date, the study suggests implementing monitoring analytics for users and companies that rely on AI services as part of their workflows. OpenAI has recently announced plans to create a team dedicated to managing the risks associated with super intelligent AI systems, which they expect to establish within the next decade.

















