AI Companies Are Running Out of Training Data

Data plays a central role, if not the central role, in the AI economy. Data is a model’s vital force, both in basic function and in quality; the more natural — as in, human-made — data that an AI system has to train on, the better that system becomes. Unfortunately for AI companies, though, it turns out that natural data is a finite resource — and if that tap runs dry, researchers warn they could be in for a serious reckoning. As Rita Matulionyte, an information technology law professor at Australia’s Macquarie University, notes in an essay for The Conversation, AI researchers have been sounding the dwindling-data-supply-alarm-bells for nearly a year. One study last year by researchers at the AI forecasting organization Epoch AI estimated that AI companies could run out of high-quality textual training data by as soon as 2026, while low-quality text and image data wells could run dry anytime between 2030 and 2060. It’s a precarious situation for AI firms, given how much data AI systems need to operate and improve. AI models have advanced drastically as developers have poured in more and more data. If the data supply stagnates, so may the models — and thus, perhaps, the industry. Though Matulionyte offers the use of synthetic data — or data generated by AI models — to train new models as a possible mitigation technique for data-hungry AI companies, that might not be a viable solution either. Indeed, using synthetic content might actually wreck a given model entirely;…AI Companies Are Running Out of Training Data

Leave a Reply

Your email address will not be published. Required fields are marked *