OpenAI Sued for Using Everybody's Writing to Train AI

My Data, Your Data A new lawsuit against ChatGPT creator OpenAI is alleging that the buzzy Silicon Valley firm’s AI training practices violated the privacy and copyright of — well, of pretty much everyone who’s ever posted anything online. To train its powerful AI language models, OpenAI utilized an incredible amount of data scraped from various corners of the web. Although OpenAI doesn’t even know exactly what its systems are trained on, those datasets include everything from Wikipedia articles and famous novels to social media posts and incredibly niche erotica — and OpenAI didn’t ask permission for any of it. The class action suit, filed in California, alleges that failing to follow proper procurement guidelines, including seeking the consent of those who produced that content in the first place, amounts to straight-up data theft. “Despite established protocols for the purchase and use of personal information, Defendants took a different approach: theft,” reads the filing. “They systematically scraped 300 billion words from the internet, ‘books, articles, websites and posts — including personal information obtained without consent.'” Not So Free Web It’s a fair criticism. If you’ve been online at all in the past few decades, your digital outputs are likely embedded into OpenAI’s datasets, meaning that anything that OpenAI’s generative models churn out — for profit — might have bits and pieces of your silently-scraped digital labor embedded into it. “All of that information is being taken at scale,” Ryan Clarkson, the managing partner at the firm suing OpenAI, told The…OpenAI Sued for Using Everybody's Writing to Train AI

Leave a Reply Cancel reply