Database Powering Google's AI Pulled Down After It's Found to Contain Child Sexual Abuse

Google has been training its AI image generator on child sexual abuse material. As 404 Media reports, AI nonprofit LAION has taken down its 5B machine learning dataset — which is very widely used; even Google uses it to train its AI models — out of “an abundance of caution” after a recent and disturbing Stanford study found that it contained 1,008 instances of externally validated child sexual abuse material (CSAM) and 3,226 suspected instances in total. It’s a damning finding that highlights the very real risks of indiscriminately training AI models on huge swathes of data. And it’s not just Google using LAION’s datasets — popular image-generating app Stable Diffusion has also been training its models on LAION-5B, one of the largest datasets of its kind that is made up of billions of scraped images from the open web, including user-generated data. The findings come just months after attorneys general from all 50 US states signed a letter urging Congress to take action against the proliferation of AI-generated CSAM and to expand existing laws to account for the distribution of synthetic child abuse content. But as it turns out, the issue is even more deeply-rooted than bad actors using image generators to come up with new CSAM — even the datasets these image generators are being trained with appear to be tainted. As detailed in the study, conducted by the Stanford Internet Observatory, researchers found the offending instances through a hash-based detection system. “We find that having possession of a…Database Powering Google's AI Pulled Down After It's Found to Contain Child Sexual Abuse

Leave a Reply

Your email address will not be published. Required fields are marked *