Study Claims GPT-4’s Coding Skills Have Declined, OpenAI Denies Claims

Researchers Find Changes in GPT-4’s Performance

A recent research paper from Stanford University and University of California, Berkeley suggests that OpenAI’s GPT-4, the popular AI language model, may have experienced a decline in its coding and compositional capabilities. The study, titled “How Is ChatGPT’s Behavior Changing over Time?”, compares the performance of GPT-3.5 and GPT-4 versions released in March and June 2023. Notably, GPT-4’s accuracy in identifying prime numbers allegedly dropped from 97.6 percent to just 2.4 percent during this period, while GPT-3.5 showed improved performance.

Unproven Beliefs and Larger Issues with OpenAI

Although experts are divided on the study’s findings, they argue that this highlights a broader concern with how OpenAI manages its model releases. Some possible explanations for GPT-4’s decline in performance include OpenAI’s efforts to streamline model outputs by distilling models, fine-tuning for reducing harmful outputs, and unsupported conspiracy theories about reducing coding capabilities to boost GitHub Copilot subscriptions.

OpenAI’s Denial and Counterarguments

OpenAI, on the other hand, has consistently denied any decline in GPT-4’s capabilities. Peter Welinder, OpenAI’s Vice President of Product, recently tweeted that each new version of GPT is smarter than the previous one, suggesting that increased usage may lead to the discovery of previously unnoticed issues.

Experts Question the Study’s Conclusions

While the study appears to support the claims made by GPT-4 critics, some experts argue that the findings are not conclusive. Arvind Narayanan, a computer science professor at Princeton University, believes that the study’s evaluation criteria may not accurately measure GPT-4’s performance. He criticized the study for focusing on the immediate execution of code rather than evaluating its correctness, suggesting that the inclusion of non-code text in GPT-4’s output may have affected the results.

See also  Google's AI-Powered Search Experience Can Now Generate Images: Unlocking Creative Possibilities

Despite the ongoing debate, it is clear that further research and scrutiny are necessary to determine the true impact of GPT-4’s alleged decline in coding skills.


Lingjiao Chen, Matei Zaharia, James Zou. How Is ChatGPT’s Behavior Changing over Time? Available from: 2023.

About Author

Teacher, programmer, AI advocate, fan of One Piece and pretends to know how to cook. Michael graduated Computer Science and in the years 2019 and 2020 he was involved in several projects coordinated by the municipal education department, where the focus was to introduce students from the public network to the world of programming and robotics. Today he is a writer at Wicked Sciences, but says that his heart will always belong to Python.