Scientists Train New AI Exclusively on the Dark Web

DarkBERT Rises OpenAI’s large language models (LLMs) are trained on a vast array of datasets, pulling information from the internet’s dustiest and cobweb-covered corners. But what if such a model were to crawl through the dark web — the internet’s seedy underbelly where you can host a site without your identity being public or even available to law enforcement — instead? A team of South Korean researchers did just that, creating an AI model dubbed DarkBERT to index some of the sketchiest domains on the internet. It’s a fascinating glimpse into some of the murkiest corners of the World Wide Web, which have become synonymous with illegal and malicious activities from the sharing of leaked data to the sale of hard drugs. It sounds like a nightmare, but the researchers say DarkBERT has noble intentions: trying to shed light on new ways of fighting cybercrime, a field that has made increasing use of natural language processing. Cybercrime Fighter Perhaps unsurprisingly, making sense of the parts of the web that aren’t indexed by search engines like Google and often can only be accessed via specific software wasn’t an easy task. As detailed in a yet-to-be-peer-reviewed paper titled “DarkBERT: A language model for the dark side of the internet,” the team hooked their model up to the Tor network, a system for accessing parts of the dark web. It then got to work, creating a database of the raw data it found. The team says their new LLM was far better at making sense…Scientists Train New AI Exclusively on the Dark Web

Leave a Reply

Your email address will not be published. Required fields are marked *