GitHub Data Leaks: Detection & Prevention Guide

February 14, 2023

Gradient blue background. There is a light orange oval with the white text "BLOG" inside of it. Below it there's white text: "GitHub Data Leaks- Detection & Prevention Guide." There is white text underneath that which says "Learn More" with a light orange arrow pointing down.

In a modern digital world, almost every company is a software development company. Your company may develop apps that provide digital customer experiences or build software that enable employee productivity. Developers use GitHub to collaborate efficiently and manage version control, recording and controlling software changes. Security teams know they need to monitor GitHub because the repositories often store sensitive information in plaintext, making them attractive targets for cybercriminals. However, this time-consuming, manual process can lead to errors and data breaches.

Understanding how threat actors find sensitive information stored in GitHub and implementing best practices for monitoring leaked GitHub data can help mitigate data breach risks and enhance your overall security program.

What Are Secrets?

In GitHub, secrets refers to any digital authentication credentials that enable services, infrastructures, and applications interoperability, including:

0Auth tokens
API keys
Usernames
Passwords
Encryption keys
Security certificates

Storing this information as plaintext within the source code is referred to as hardcoding.

Why Developers Store Secrets in GitHub

Every day, developers need to write new code and test it in their production environments. By hard-coding secrets, they can run tests without needing to remember and manually input every credential, every time.

Meanwhile, modern digital infrastructures create complex interconnections across applications and services, all requiring secrets so that they can function as a complete unit. Further, various internal teams need to share credentials during the code development processes, including:

DevOps
Site Reliability Engineers (SRE)
Security

Hardcoding secrets makes it easier for developers to update services, scale applications, and outsource development work. However, since these secrets provide access to critical resources, hardcoding creates a security risk. For example, every time developers clone, check out, or fork the source code, the process replicates the plaintext secrets.

How Threat Actors Find and Leak GitHub Secrets

Just as adding new applications expands an external attack surface, hardcoding secrets expands the GitHub attack surface. As developers work with the hardcoded secrets, they create secret sprawl, meaning that they lose control over who can access secrets and where they store secrets.

Accidentally Using Personal Accounts

GitHub Enterprise requires developers to use a “personal” account to join the organization’s repository. Even if the account is linked to the developer’s corporate email, any repository created under that account is considered a personal repository, and personal repositories are set to public by default.

When software developers publish their code, they need to proactively select their company as the owner. Based on how GitHub works, they may accidentally select their username, creating the repository as a personal, default-public location. If the code includes credentials, threat actors can easily scan for them.

Phishing Attacks

Phishing attacks work for GitHub access the same way they work for enterprise IT access. Attackers use social engineering tactics to discover people on your development team, then they send fake emails that seem legitimate, coercing people into giving up their passwords.

Since employees need to use a personal GitHub account to access their private repository, one stolen developer credential can compromise your organization’s repository.

Reused or Shared Passwords

Developers need to access resources, like GitHub, so they can complete their job functions. Additionally, they need these digital authorizations so that they can update code and push it out on time. When companies do not have rigorous security measures, they can be more prone to accidental human error. For example, reused passwords and sharing secrets through email or communication tools, like Slack.

Forgotten Secrets in Application Code Files

Developers might hard-code a credential during development, intending but forgetting to delete it later. When they do this, the hard-coded secrets remain stored within the version control system (VCS). Attackers using scanners that look for patterns related to credential type can discover these secrets.

GitHub Actions Pipelines

GitHub Actions is a CI/CD workflow automation. If your GitHub action pulls in container images to run application components or testing environments, then it might include hard-coded credentials. With these secrets written in plaintext, attackers can scan your configuration files to locate exposed secrets.

Importance of GitHub Data Leak Monitoring

For developers, hardcoding credentials is about speed and productivity. Companies want software shipped on time, and hardcoding credentials enables software engineers to test applications easily.

Meanwhile, overwhelmed security teams already suffer from alert fatigue just monitoring the enterprise technology stack. Adding another location creates yet another burden. Further, most GitHub monitoring solutions review the code for embedded secrets, not the repository for abnormal access.

This gap between development tools and security monitoring technologies means that companies often lack real-time detection capabilities. They only find out about compromised secrets after the attackers have sold them or used them in a follow-up attack.

For companies that assume their private repositories mitigate risk, GitGuardian’s research report found that private repositories were four times more likely to experience an incident. Further, the statistics in the research showed the breadth of the problem noting:

1,050 unique secrets, on average, were leaked for a typical company with 400 developers and 4 AppSec engineers
1 AppSec engineer needs to handle an average of 3,413 secrets occurrences
6+ million secrets were detected in public repositories
3 public repository commits out of 1,000 exposed at least one secret
84 AWS IAM credentials were leaked for every 10,000 public repository commits scanned
500+ public commit messages contained GitHub personal access tokens

Organizations need to take proactive steps to mitigate risk and institute monitoring to detect compromised secrets.

One of our customers is a Fortune 100 company that uses Flare’s GitHub/source code leak monitoring to automate tracking asset relations between GitHub repositories, users, domains, and emails across multinational developer teams. Learn more about how our customer tracks and addresses GitHub leaked secrets without time-intensive and inefficient manual searches.

GitHub Data Leak Prevention Best Practices

While preventing GitHub secrets leaks can be challenging, you can take several steps to mitigate risk.

Limit Repository Access

When providing access, employ the principle of least privilege by limiting who can read or change code. Additionally, you should document all activities so that you can track code changes.

Implement and Enforce Strong Passwords

Passwords that grant access to your repository should follow best password practices. At a minimum, you should ensure that you set an appropriate:

Length
Complexity
Expiry date
Rotation

Protect Access Credentials

Developers use login IDs combined with passwords or private keys to access your repository. Some ways that you can protect these credentials include:

Password protecting private keys
Giving developers hardware tokens
Rotating access keys

Monitor Privileged Sessions

Privileged access management (PAM) is as critical in development environments as in enterprise IT environments. You should monitor, document, and record privileged sessions across:

Accounts
Users
Scripts
Automation tools

Scan for Hardcoded Credentials

By scanning your repository for hardcoded credentials, you can detect a risk before a compromise occurs. When you detect a potential leak, you need to:

Revoke the exposed secret
Clean the repository history
Inspect logs

When cleaning up a repository after identifying a potentially exposed secret, you need to remember that the secret will be visible in all previous commits and repository forks. Any mirrored versions are still risky.

Monitor Members’ Personal Public Repositories

You can use workflows to help monitor your developers’ personal repositories for potentially leaked code.

Scan GitHub for Leaked Code

By scanning GitHub for leaked code, you search beyond your members’ repositories for potentially compromised secrets.

How Flare Can Help

With Flare, security teams can integrate GitHub monitoring while reducing the traditional burdens associated with the process. We combine state of the art data collection systems with our noise reduction and prioritization engine so that security teams have the necessary context to classify a data leak’s criticality.

With Flare, your security team can build efficient processes that enable them to proactive respond to technical data leaks, reducing the time and costs associated with a secrets incident.

GitHub Data Leaks: Detection & Prevention Guide

What Are Secrets?

Why Developers Store Secrets in GitHub