Home Emerging TechMicrosoft AI researchers leaked sensitive data on GitHub

Breaking news: Microsoft AI researchers expose sensitive data in a Github mishap

The content of the exposed data include personal backups of two Microsoft employees' personal computers, passwords for Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from hundreds of employees.

Microsoft AI researchers accidentally expose 38 Terabytes of sensitive data

Samira Siddiqui

New Delhi, UPDATED: Sep 19, 2023 11:47 IST

Highlights

Microsoft AI researchers accidentally publish private data on GitHub
The data includes Microsoft employees' personal data, passwords, & internal messages
The mishap occurred due to an overly permissive SAS token in the URL

In a startling development, Microsoft's AI research division inadvertently exposed terabytes of sensitive data on GitHub. The incident occurred when they published a storage bucket of open-source training data. Cloud security startup Wiz uncovered this issue while investigating accidental data exposures in cloud-hosted repositories. The exposed data included private keys, passwords, personal backups, and more. Here's what happened:

The GitHub repository mishap

Open-source code and picture recognition AI models were available on the problematic GitHub repository. The download of these models from an Azure Storage URL was required by users.

Microsoft AI Researchers Expose 38TB of Top Sensitive Data https://t.co/3Sjo68NkaG #infosec #security
— ⓃⓄⓉTruppi (@NotTruppi) September 18, 2023

However, a crucial error in the URL's setting allowed access to the whole storage account rather than only the data that was intended. This configuration error has persisted since 2020, putting sensitive information at risk.

Contents of the exposed data

The 38 terabytes of exposed data contained various sensitive elements. This included personal backups of two Microsoft employees' personal computers, passwords for Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from hundreds of employees. The misconfiguration even allowed for full control access, meaning potential attackers could delete, replace, or inject malicious content.

The SAS token oversight

The exposure didn't result from the storage account itself being directly compromised. Instead, it happened due to the inclusion of an overly permissive Shared Access Signature (SAS) token in the URL. SAS tokens are used to create shareable links, but in this case, they grant more access than intended.

Response & investigation

On 22 June, 2023, Wiz informed Microsoft of its findings. On 24 June, Microsoft quickly took action and revoked the SAS token. Furthermore, on 16 August, they completed their inquiry into the incident's effects on the organisation. Fortunately, according to Microsoft, no customer data was compromised, and no additional internal services were exposed.

Preventing future incidents

Microsoft has increased GitHub's secret spanning service to stop such incidents. This service now keeps an eye on any modifications to publicly available open-source code for the plaintext disclosure of passwords and other secrets, including SAS tokens with too-lax security settings.

COMMENTS 0

Recommended