Researchers have found that AI tech company Cognition’s Devin, which it claims to be the “first AI software engineer,” is astonishingly bad at its job. In a recent analysis, first spotted by The Register, a team of machine learning data scientists behind the independent AI research and development lab Answer.AI spent a month with the AI assistant, concluding that despite almost a year of hype, it “rarely worked.” “Out of 20 tasks we attempted, we saw 14 failures, three inconclusive results, and just three successes,” the researchers found — a meager success rate of just 15 percent. Super, we’ve all had coworkers like that. But for tech that’s supposed to represent the future, it’s not inspiring confidence. “More concerning was our inability to predict which tasks would succeed,” the team wrote. “Even tasks similar to our early wins would fail in complex, time-consuming ways. The autonomous nature that seemed promising became a liability — Devin would spend days pursuing impossible solutions rather than recognizing fundamental blockers.” For instance, Devin was asked to deploy multiple applications to a deployment platform called Railway, but instead of realizing it was “not actually possible to do this,” Devin “marched forward and tried to do this and hallucinated some things about how to interact with Railway.” The results highlight that despite Cognition AI’s boisterous marketing about Devin being able to “build and deploy apps end to end” when the tool was first introduced in March 2024, the tech is still struggling with some fundamental problems. It’s…The "First AI Software Engineer" Is Bungling the Vast Majority of Tasks It's Asked to Do