ChatGPT Actually Gets Half of Programming Questions Wrong

Failing Grade Not long after it was released to the public, programmers started to take note of a notable feature of OpenAI’s ChatGPT: that it could quickly spit out code, in response to easy prompts. But should software engineers really trust its output? In a yet-to-be-peer-reviewed study, researchers at Purdue University found that the uber-popular AI tool got just over half of 517 software engineering prompts from the popular question-and-answer platform Stack Overflow wrong — a sobering reality check that should have programmers think twice before deploying ChatGPT’s answers in anything important. Pathological Liar The research goes further, though, finding intriguing nuance in the ability of humans as well. The researchers asked a group of 12 participants with varying levels of programming expertise to analyze ChatGPT’s answers. While they tended to rate Stack Overflow’s answers higher across categories including correctness, comprehensiveness, conciseness, and usefulness, they weren’t great at identifying the answers ChatGPT got wrong, failing to identify incorrect answers 39.34 percent of the time. In other words, ChatGPT is a very convincing liar — a reality we’ve become all too familiar with. “Users overlook incorrect information in ChatGPT answers (39.34 percent of the time) due to the comprehensive, well-articulated, and humanoid insights in ChatGPT answers,” the paper reads. So how worried should we really be? For one, there are many ways to arrive at the same “correct” answer in software. A lot of human programmers also say they verify ChatGPT’s output, suggesting they understand the tool’s limitations. But whether that’ll continue to…ChatGPT Actually Gets Half of Programming Questions Wrong

Leave a Reply

Your email address will not be published. Required fields are marked *