{"id":4180,"date":"2023-07-21T05:08:37","date_gmt":"2023-07-21T05:08:37","guid":{"rendered":"https:\/\/www.godefy.com\/ai-developers-are-already-quietly-training-ai-models-using-ai-generated-data"},"modified":"2023-07-21T05:08:37","modified_gmt":"2023-07-21T05:08:37","slug":"ai-developers-are-already-quietly-training-ai-models-using-ai-generated-data","status":"publish","type":"post","link":"https:\/\/www.godefy.com\/ai-developers-are-already-quietly-training-ai-models-using-ai-generated-data\/","title":{"rendered":"AI Developers Are Already Quietly Training AI Models Using AI-Generated Data"},"content":{"rendered":"

Self-Fulfilling While most AI models are built on data made by humans, some companies are starting to use \u2014 or are trying to figure out how to use \u2014 data that was itself generated by AI. If they can pull it off, it could be a huge boon, albeit one that makes the entire AI ecosystem feel even more like a sort of algorithmic ouroboros. As the\u00a0Financial Times reports, companies including OpenAI, Microsoft, and the two-billion-dollar startup Cohere are increasingly investigating what’s known as “synthetic data” to train their large language models (LLMs) for a number of reasons, not least of which being that it’s apparently more cost-effective. “Human-created data,” Cohere CEO Aiden Gomez told the\u00a0FT, “is extremely expensive.” Beyond the relative cheapness of synthetic data, however, is the scale issue. Training cutting-edge LLMs starts to use essentially all the human-created data that’s actually available, meaning that to build even stronger ones, they’re almost certainly going to need more. “If you could get all the data that you needed off the web, that would be fantastic,” Gomez said. “In reality, the web is so noisy and messy that it\u2019s not really representative of the data that you want. The web just doesn\u2019t do everything we need.” It’s All Happening As the CEO noted, Cohere and other companies are already quietly using synthetic data to train their LLMs “even if it\u2019s not broadcast widely,” and others like OpenAI seem to expect to use it in the future. During an event in…AI Developers Are Already Quietly Training AI Models Using AI-Generated Data<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"

Self-Fulfilling While most AI models are built on data made by humans, some companies are starting to use \u2014 or are trying to figure out how to use \u2014 data… <\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[65,586,112,300,788,3686,214,83,12,3685,574,295],"_links":{"self":[{"href":"https:\/\/www.godefy.com\/wp-json\/wp\/v2\/posts\/4180"}],"collection":[{"href":"https:\/\/www.godefy.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.godefy.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.godefy.com\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.godefy.com\/wp-json\/wp\/v2\/comments?post=4180"}],"version-history":[{"count":0,"href":"https:\/\/www.godefy.com\/wp-json\/wp\/v2\/posts\/4180\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.godefy.com\/wp-json\/wp\/v2\/media?parent=4180"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.godefy.com\/wp-json\/wp\/v2\/categories?post=4180"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.godefy.com\/wp-json\/wp\/v2\/tags?post=4180"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}