Home Web3 CybersecurityAn AI trained on Dark Web data to fight cybercrime!

DarkBERT: The AI trained on Dark Web data to fight cybercrimals is here

Researchers are training artificial intelligence to index data on the Dark Web to fight cybercrime and track malicious activities.

Web3Cafe Desk

New Delhi, UPDATED: May 29, 2023 11:45 IST

Highlights

The AI has been trained on Dark Web data
Has been created to boost cybersecurity
The tool is based on Facebook’s RoBERTa

Large language models like ChatGPT and Bard are extremely popular right now. They are trained on various types of text available on the internet, including websites, articles, and books, which makes their responses diverse and impressive. However, some researchers have experimented with training models like ‘DarkBERT’ on data from the Dark Web instead. This has led to interesting and unexpected outcomes.

“DarkBERT: A Language Model for the Dark Side of the Internet” [1].

A new academic paper on a LLM trained from the dark web.

Yes it is exactly what you fear, but no license or law will stop it.

So we study it to understand humanity and our failings.

[1]… pic.twitter.com/wwAxPCqMtK
— Brian Roemmele (@BrianRoemmele) May 21, 2023

But, what is Dark Web?

The dark web refers to the part of the internet that is not easily accessible and not indexed by search engines like Google. It is called the “Dark Web" because it is known for being a hidden and anonymous space where people can engage in various questionable activities privately. Unlike the regular internet, which is easily accessible through search engines and requires no special tool, the dark web requires special software, such as ‘Tor’ browser to access it.

Tor helps to protect the identity and location of users, by masking their IP address, making it difficult for others to track them. The dark web is often associated with illegal activities, such as buying and selling drugs, weapons, or stolen data. It is also known for hosting various types of marketplaces where illegal goods and services can be exchanged.

However, it is important to note that not everything on the dark web is illegal. It also serves as a platform for whistleblowers, journalists, and activists who need to communicate anonymously and securely.

What is DarkBert?

A group of researchers from South Korea has published a paper outlining the development of a large language model(LLM) using a vast collection of data sourced from the Dark Web, specifically obtained by crawling the Tor network.

The dataset encompassed a range of questionable websites spanning various categories such as cryptocurrency, pornography, hacking, weaponry, and more. However, to address ethical concerns and prevent the potential extraction of sensitive information by malicious individuals, the team has ensured some guardrails are put in place to refine the pre-training corpus through filtering before employing DarkBert.

The name DarkBert was chosen for the language model as it is built upon the RoBERTa architecture, which was originally introduced by Facebook researchers in 2019. RoBERTa is a transformer-based model that serves as the foundation for DarkBERT, providing a framework for its development and functioning.

What is the purpose of DarkBERT?

DarkBERT, despite its sinister-sounding name, is designed for security and law enforcement purposes rather than engaging in malicious activities. Since it was trained using data from the dark web, which contains numerous illicit websites where large sets of stolen passwords are often found, DarkBERT outperforms existing language models in cybersecurity and cyber threat intelligence applications.

What’s next?

There's been a lot of progress with the development of DarkBERT. Researchers are working on adding multiple languages to the pre-trained model. By using the latest language in the model, DarkBERT is expected to perform even better and gather more data from different sources.

Published on: May 26, 2023 17:43 ISTPosted by: cyrus john, May 26, 2023 17:43 IST

IN THIS STORY

#DarkBERT

COMMENTS 0

Recommended