Government Test Finds That AI Wildly Underperforms Compared to Human Employees

Sums It Up Generative AI is absolutely terrible at summarizing information compared to humans, according to the findings of a trial for the Australian Securities and Investment Commission (ASIC) spotted by Australian outlet Crikey. The trial, conducted by Amazon Web Services, was commissioned by the government regulator as a proof of concept for generative AI’s capabilities, and in particular its potential to be used in business settings. That potential, the trial found, is not looking promising. In a series of blind assessments, the generative AI summaries of real government documents scored a dire 47 percent on aggregate based on the trial’s rubric, and were decisively outdone by the human-made summaries, which scored 81 percent. The findings echo a common theme in reckonings with the current spate of generative AI technology: not only are AI models a poor replacement for human workers, but their awful reliability means it’s unclear if they’ll have any practical use in the workplace for the majority of organizations. Signature Shoddiness The assessment used Meta’s open source Llama2-70B, which isn’t the newest model out there, but with up to 70 billion parameters, it’s certainly a capable one. The AI model was instructed to summarize documents submitted to a parliamentary inquiry, and specifically to focus on what was related to ASIC, such as where the organization was mentioned, and to include references and page numbers. Alongside the AI, human employees at ASIC were asked to write summaries of their own. Then five evaluators were asked to assess the…Government Test Finds That AI Wildly Underperforms Compared to Human Employees

Leave a Reply

Your email address will not be published. Required fields are marked *