More than 200,000 credentials to AI language models are currently being sold on the dark web as part of stealer logs, files containing thousands of credentials derived from infostealer malware. This certainly raises the risk that employees will leak sensitive data into models, and then lose the credentials. However, we see even more concerning malicious uses of AI language models.
Mathieu Lavoie (CTO and Co-Founder of Flare), Serge-Olivier Paquette (Director of Innovation), and Eric Clay (VP of Marketing) discussed AI language models, their capabilities, and how attackers are using them.
Just as organizations increasingly incorporate AI language models, like GPT-4, into their business operations, cybercriminals have found ways to monetize them.
With open-source models, like LLAMA, LLAMA 2, and Vicuna, the weights of the model are public, enabling malicious actors to bypass safety measures such as reinforced learning with human feedback (RLHF) that prevent the technologies from being used to harm others.
AI language models are being introduced to a cybercrime ecosystem that is increasingly commoditized, streamlined, and easy to access. For cybercriminals, these AI language models offer an easy-to-use technology that enables them to automate additional criminal activities and uplevel their current “as-a-Service” offerings.
Check out our full webinar recording, How the Dark Web is Reacting to the AI Revolution, and/or keep reading for the highlights.
The research focused on how cybercriminals are leveraging large language models. AI language models are here to stay, and this means they will increasingly be incorporated into the complex cybercrime ecosystem. Understanding not just current, but emerging threats is critical for organizations to effectively manage cyber risk.
Language Models and Capabilities
To understand the sophistication level of current cybercriminal offerings, identifying baseline metrics for the models themselves is critical. Some typical capabilities to look for include:
- Problem-solving: Reasoning to an acceptable and correct outcome
- Theory of mind: Understanding and comprehending the mental states of individuals to explain behaviors
- Zero-shot learning: Answering questions not in the model’s training data and reasoning out a satisfactory conclusion
- Completeness of answers: Answering questions as completely as possible
Taking these capabilities into account, the current models, ordered by capability from least to most, are:
- Claude 2
Fortunately, the most capable models are not currently open source. This forces cyber criminals to work with less capable models that pose less risk for the moment. However, the technology is developing rapidly and it is highly likely that threat actors will have access to opensource models similar to the current state of the art, within two years.
Training a Model
All AI models go through the following four-step training process:
- Data collection and preprocessing: vast amounts of unstructured text data created to train the initial model weights on
- Model training: initial model weights created at the end of training
- Model fine-tuning: model weights fine-tuned on ideal, much smaller text for specific tasks
- Reinforced learning with human feedback (RLHF): human evaluators rate model output, then weights are adjusted to favor output rated more highly
The RLHF essentially “rewards” the model by helping it distinguish between a good, non-harmful prompt or a bad, harmful prompt. For example, a bad prompt would be asking how to make a bioweapon or how to write a phishing email. This is why ChatGPT and GPT-4 commonly refuse to answer questions that could be harmful.
Typically, companies – including cybercriminals – will fine-tune open-source models to their needs. Building a best-in-class model, often called a “Frontier model” often costs more than $10 million and takes months of compute resources. Having the weights means that the cybercriminals can bypass the RLHF process, allowing them to use the model for harmful purposes like designing phishing emails, creating malware, and improving existing malicious code.
The Current State of Malicious AI
Although malicious actors use the term “GPT,” they most likely use an open-source model rather than the OpenAI technology. However, they brand these as “GPT” because the term resonates with a broader audience. During the summer of 2023, researchers started identifying open source models that had restrictions removed and were fine tuned for cybercrime, beginning with FraudGPT and WormGPT available for purchase on the dark web.
Fundamentally, these AI models exist within the broader cybercrime ecosystem because they support other offerings like Malware-as-a-Service (MaaS), Ransomware-as-a-Service (RaaS), and Phishing-as-a-Service (PaaS).
Cybersecurity researcher John Hammond looked into several dark web generative AI chatbots such as DarkBard and DarkGPT while cross-referencing with Flare:
Below are some other dark web AI chatbots:
WormGPT’s creator was a 23-year-old programmer from Portugal who appears to have since taken it down. However, the subscription-based model selling for $500/month was advertised as one tuned on creating malware. The model has the potential to help cyber criminals iterate their malicious code more efficiently rather than automate the process of creating a full-blown architecture or software.
More recently, threat actors replicated the ChatGPT interface and fine-tuned the model to help create spear-phishing emails used during business email compromise (BEC) and other fraudulent activities.
This model poses a different risk when coupled with PaaS infrastructures to create personalized emails at-scale, lowering the cybercriminal barrier of entry. For example, smaller fraudsters may have an idea of what they want to do, but they often hire people “legitimately” to bring these plans to fruition. With models like FraudGPT, cybercriminals will no longer need these freelance workers.
The Future of Malicious AI
Short Term Risks
Employees using ChatGPT can accidentally leak data into models and then lose the credentials for the models. For example, cybercriminals could take over a user’s account to look through their post history with the model. Then, they turn around and sell the information on the dark web.
Meanwhile, adversaries can attack the models and cause them to expose sensitive training data like personally identifiable information (PII) or confidential information that users provided in their prompt. For example, one person discovered an adversarial attack where posting 1,000 zeros with one space between each zero into the model generated snippets of random text, some of which appeared to be other conversations that people had with the model.
Medium Term Risks
Over the next few years, agential models could change the threat landscape. With more capable models chained together to create AI “agents,” a language model could automate typically manual processes like:
- Actively searching for vulnerabilities, stealer logs with corporate access, and GitHub secrets faster and more broadly than people
- Collecting information about victims to create more effective spear phishing emails
- Expanding deepfake and vishing campaigns
Mitigations, Remediations, and Recommendations
As cyber criminals increasingly leverage language models, organizations should start proactively working to mitigate risks. Simultaneously, they should start thinking about how they can leverage these technologies to stay ahead of attackers.
At a minimum, organizations can begin with:
- Detecting AI-content: Leveraging research around identifying the difference between AI-generated and human-generated content to reduce phishing risks if it can be feasible and made widely available
- Policies and processes: Implementing controls around how employees can use models and share data with them to mitigate data leaks
- Tokenization: Obscuring potentially sensitive data to reduce risks arising from using models for corporate applications while maintaining the same output
How Flare Can Help
Flare provides proactive cyber threat exposure management with AI-driven technology to constantly scan the clear & dark web, and Telegram channels. Our platform provides data from 14 million stealer logs and 2 million threat actor profiles.
Since our platform automatically collects, analyzes, structures, and contextualizes dark web data, you gain the high-value intelligence specific to your organization for 10x faster dark web investigations and major reduction in data leak incident response costs.
Start your free trial today to learn more.