
By Andréanne Bergeron, Security Researcher
Data breaches are so common that they have almost become background noise. A notification email arrives, a password reset is suggested, and most people move on. This normalization is dangerous, not because individual breaches are trivial, but because it obscures how threat actors actually use stolen data. The real threat is rarely a single leaked record. It is what happens when multiple sources get combined.
While monitoring cybercriminal Telegram channels, we discovered an automated bot that accepts a victim’s email address as input and instantly returns a structured profile aggregated from multiple breach databases: passwords, linked phone numbers, names, and other personally identifiable information (PII), all compiled in seconds. This finding captures something that has been quietly building for years: data aggregation has become so efficient that it no longer requires meaningful technical skill.
This article explains how this capability evolved, who uses it, and what it means for victims.
Key Takeaways About Data Aggregation
- We identified a Telegram bot that automates the full aggregation process from a single email address query. The bot returns passwords, associated usernames, linked phone numbers, names, and other PII compiled from multiple breach databases in seconds, with no technical knowledge required.
- Nation-state threat actors have orchestrated deliberate multi-breach campaigns specifically to correlate datasets and build espionage-grade profiles on individuals.
- Financially motivated actors on dark web forums have replicated this logic at scale using combolists and profile enrichment pipelines.
- The breach risk to a victim is not linear: each additional piece of leaked data activates and amplifies the data already exposed, dramatically increasing the potential for harm. An email address alone enables a generic phishing attempt. Add a name, employer, job title, historical password, and date of birth from separate sources, and the profile can defeat knowledge-based authentication and enable highly personalized impersonation.
- Breach monitoring focused solely on direct exposure is insufficient. The relevant question is not whether an employee’s credentials appeared in a single breach, but what combination of data about that individual is now available across the full ecosystem of leaks, paste sites, stealer logs, and automated aggregation services.
See the Full Picture of What Attackers Know About Your People
Attackers no longer look at single breaches in isolation — they aggregate across leaks, paste sites, stealer logs, and Telegram channels in seconds. Flare monitors the same ecosystem to show you the full constellation of exposed data about your employees before it’s weaponized.
When Data Aggregation is a Strategic Objective: Nation-State Campaigns
The most documented and deliberate cases of cross-source data aggregation do not come from criminal forums, but come from nation-state intelligence operations. The difference between a threat actor who stumbles on a useful dataset and one who systematically targets multiple organizations to combine their data is the difference between opportunistic crime and strategic intelligence collection.
The 2014–2015 campaign attributed to Chinese state-sponsored hackers targeting the US Office of Personnel Management (OPM), health insurer Anthem, and United Airlines is the clearest example. On their own, each breach was significant. The OPM compromise exposed security clearance records for 21.5 million current and former government employees, including sensitive background investigation files. The Anthem breach yielded 78.8 million health insurance records, with diagnoses, medications, and mental health data. The United Airlines breach provided something more operationally specific: passenger manifests detailing the travel origins and destinations of millions of Americans.
The strategic intent only becomes clear when you consider the datasets together. Security analysts described the campaign at the time not as three separate attacks, but as a coordinated effort to correlate disparate records into comprehensive profiles of US intelligence personnel, identifying who they were, what health vulnerabilities they might have, and where they traveled. The RAND Corporation, in formal congressional testimony, confirmed that the threat actors combined these records and allowed the actor to locate individuals at frequently traveled destinations. This identifies potential vectors for blackmail or recruitment. The same cluster of activity was subsequently linked to a breach of travel booking firm Sabre, extending the travel intelligence picture even further.
The 2014 Marriott–Starwood breach, also widely attributed by security researchers to a Chinese state actor, fits the same pattern. Attackers remained undetected inside Starwood’s reservation system for four years, ultimately exfiltrating 339 million guest records including passport numbers, loyalty program data, and granular arrival and departure information. When this dataset is considered alongside OPM, the strategic logic is clear: travel records cross-referenced with security clearance files allow an intelligence service to map the movements of government and military officials with precision.
These campaigns established the template: identify the datasets that, in combination, produce a complete intelligence picture, and target them deliberately, even if the individual breaches appear unrelated.
The Same Logic, Democratized: How Criminal Actors Replicate It
Nation-state campaigns require significant infrastructure and patience. What has changed over the last decade is that the underlying logic of combining multiple sources to build a more actionable profile has become accessible to virtually any financially motivated actor.
Dark web forums and marketplaces like BreachForums function as persistent, searchable archives of breach data dating back nearly two decades. The raw material for aggregation is abundant, cheap, and increasingly well-organized.
Professional Network Scrapes Enable Profile Enrichment
The LinkedIn scraping incidents of 2021 illustrate how this plays out in practice. When data from an estimated 500 million LinkedIn profiles appeared for sale (including names, employers, job titles, email addresses, and phone numbers) LinkedIn characterized the event as scraping of publicly accessible data rather than a direct breach. Cybernews researchers described the exploitation pattern precisely as a type of data set that serves as a foundational base for profile enrichment, allowing an actor to take a bare email address from an infostealer log and link it to a full professional profile. The result is a target package of who the person is, where they work, who their manager is, and what kind of communication would appear credible coming from inside their organization.
The Spear-Phishing Payoff
This is the mechanism behind the most convincing spear-phishing campaigns. An attacker who knows a target is a finance director at a specific company does not send a generic phishing email. They instead send a message impersonating the CFO, referencing a plausible internal process, using language calibrated to the target’s professional context. All of it assembled from information that was technically “public” or sourced from separate, individually minor leaks. APT groups including Lazarus and Nobelium used exactly this approach against targets in the cryptocurrency and technology sectors.
The Named Leak Database at Flare, which spans 2,908 incidents collected since 2016, consistently shows that the most sensitive exposures occur at the intersection of multiple data types. The Mate1 dating site breach, for instance, exposed 25 distinct PII types including drinking and drug habits, political views, income levels, and sexual preferences, for approximately 27 million users. That kind of behavioral and lifestyle data, combined with contact information from a separate breach and employment context from a professional network scrape, creates conditions for blackmail, targeted manipulation, and highly personalized fraud.
The Automation of Aggregation on Telegram
The cases above still assume some degree of manual effort: browsing forums, purchasing datasets, cross-referencing records, or even hacking the infrastructure themselves to get the data. What we observed in monitored Telegram channels represents a further step: the full aggregation process reduced to a single automated query.
How the Bot Works
The bot operates simply by having the user submit an email address. It returns a structured profile aggregated across multiple breach sources including historical passwords, associated usernames, linked phone numbers, names, and any other PII tied to that identifier across known leaked datasets. The entire process takes seconds. No technical knowledge is required. No manual cross-referencing is needed. The infrastructure does the correlation automatically.
This is important because it eliminates the last remaining friction in the data aggregation threat. Historically, combining breach data required access to multiple sources, the ability to normalize inconsistent data formats, and the analytical skill to reconcile conflicting records. These barriers are gone. Any actor with a Telegram account and a target’s email address can now obtain, on demand, a reconnaissance report that would have taken a skilled analyst hours to compile a few years ago.
The practical consequence is a compression of the gap between low-skill opportunistic attackers and sophisticated targeted ones. Personalization, the defining feature of the most dangerous phishing, social engineering, and impersonation attacks, is now available at scale to virtually anyone operating in the cybercriminal underground.
Why Aggregation Changes the Risk Calculus for Victims
A persistent misconception about breach exposure is that risk scales roughly linearly with the amount of data leaked. In reality, the relationship is closer to exponential, because each additional data point does not simply add to a profile, it activates and validates what is already there. Most breach victims remain unaware of this dynamic, partly because notifications are poorly written, and partly because the harm from any single breach is genuinely difficult to perceive in isolation.
The Aggregation Sequence
As more data points accumulate, a threat actor’s attack can become sharper:
- An email address alone enables a generic phishing attempt, which most recipients will discard.
- Add a name and employer, and the message can be personalized.
- Add a job title and a manager’s name from a LinkedIn scrape, and the attacker can plausibly impersonate internal communications.
- Add a historical password from a forum dump, and there is now a credential to test against corporate systems, and a psychological lever to use in a social engineering call.
- Add a date of birth and passport fragment from a travel booking breach, and the profile becomes sufficient to defeat knowledge-based authentication on financial and government platforms.
At no point in this sequence did any single breach create the full risk. The risk emerged from the combination. This is what we refer to as the aggregation gap: the qualitative transformation in what an attacker can do once data from multiple sources has been correlated around a single identity.
Recommendations for Security Teams
An executive whose work email appeared in a minor forum breach three years ago may today be a high-value target for a highly personalized attack, because that email has since been enriched with professional context, linked to a personal phone number, and indexed in an automated lookup service.
For organizations, this means that breach monitoring programs focused solely on direct data exposure are insufficient. The question is not only whether an employee’s credentials have appeared in a known breach but the combination of data about that employee that is now available across the full ecosystem of leaks, paste sites, stealer logs, and Telegram channels.
Effective defense requires monitoring not just for direct credential leaks, but for the full constellation of data circulating about your people across criminal marketplaces, paste sites, stealer logs, and automated aggregation services. The attacker is no longer looking at any single breach in isolation. They are looking at all of them, combined, and they can do it in seconds.
See the Full Picture of What Attackers Know About Your People
Attackers no longer look at single breaches in isolation — they aggregate across leaks, paste sites, stealer logs, and Telegram channels in seconds. Flare monitors the same ecosystem to show you the full constellation of exposed data about your employees before it’s weaponized.





