In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From

Wondering what data OpenAI used to train its buzzy new text-to-video AI? The company’s CTO is similarly unsure. Mira Murati, OpenAI’s longtime chief technology officer, sat down with The Wall Street Journal’s Joanna Stern this week to discuss Sora, the company’s forthcoming video-generating AI. About halfway through the 10-minute-long interview, Stern straightforwardly asked Murati where the new model’s training data was gleaned from. But Murati, in the most cringe-inducing way possible, couldn’t find an answer beyond vague corporate language. “We used publicly available data and licensed data,” Murati responded to the resoundingly simple question. Stern pushed back with more specific source examples: “So, videos on YouTube?” “I’m actually not sure about that,” said Murati, before rebuffing further queries about whether videos shared to Instagram or Facebook were fed into model. “You know, if they were publicly available — publicly available to use,” the CTO answered, “but I’m not sure. I’m not confident about it.” Stern then inquired about OpenAI’s data training partnership with the stock image company Shutterstock, asking if videos on the partnered platform were sucked into Sora’s training material. And this time? Murati decided to shut down the line of questioning altogether. “I’m just not going to go into detail about the data that was used,” Murati continued. “But it was publicly available or licensed data.” So, in sum, Murati can’t tell you exactly where the videos gobbled up by Sora first came from. But rest assured, the sourceless data was definitely, one hundred percent publicly available or…In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From

Leave a Reply

Your email address will not be published. Required fields are marked *