Researchers Find Changes in GPT-4’s Performance
A recent research paper from Stanford University and University of California, Berkeley suggests that OpenAI’s GPT-4, the popular AI language model, may have experienced a decline in its coding and compositional capabilities. The study, titled “How Is ChatGPT’s Behavior Changing over Time?”, compares the performance of GPT-3.5 and GPT-4 versions released in March and June 2023. Notably, GPT-4’s accuracy in identifying prime numbers allegedly dropped from 97.6 percent to just 2.4 percent during this period, while GPT-3.5 showed improved performance.
Unproven Beliefs and Larger Issues with OpenAI
Although experts are divided on the study’s findings, they argue that this highlights a broader concern with how OpenAI manages its model releases. Some possible explanations for GPT-4’s decline in performance include OpenAI’s efforts to streamline model outputs by distilling models, fine-tuning for reducing harmful outputs, and unsupported conspiracy theories about reducing coding capabilities to boost GitHub Copilot subscriptions.
OpenAI’s Denial and Counterarguments
OpenAI, on the other hand, has consistently denied any decline in GPT-4’s capabilities. Peter Welinder, OpenAI’s Vice President of Product, recently tweeted that each new version of GPT is smarter than the previous one, suggesting that increased usage may lead to the discovery of previously unnoticed issues.
Experts Question the Study’s Conclusions
While the study appears to support the claims made by GPT-4 critics, some experts argue that the findings are not conclusive. Arvind Narayanan, a computer science professor at Princeton University, believes that the study’s evaluation criteria may not accurately measure GPT-4’s performance. He criticized the study for focusing on the immediate execution of code rather than evaluating its correctness, suggesting that the inclusion of non-code text in GPT-4’s output may have affected the results.
Despite the ongoing debate, it is clear that further research and scrutiny are necessary to determine the true impact of GPT-4’s alleged decline in coding skills.
Lingjiao Chen, Matei Zaharia, James Zou. How Is ChatGPT’s Behavior Changing over Time? Available from: https://arxiv.org/pdf/2307.09009.pdf 2023.