Breaking news: Microsoft AI researchers expose sensitive data in a Github mishap
The content of the exposed data include personal backups of two Microsoft employees' personal computers, passwords for Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from hundreds of employees.


Highlights
- Microsoft AI researchers accidentally publish private data on GitHub
- The data includes Microsoft employees' personal data, passwords, & internal messages
- The mishap occurred due to an overly permissive SAS token in the URL
In a startling development, Microsoft's AI research division inadvertently exposed terabytes of sensitive data on GitHub. The incident occurred when they published a storage bucket of open-source training data. Cloud security startup Wiz uncovered this issue while investigating accidental data exposures in cloud-hosted repositories. The exposed data included private keys, passwords, personal backups, and more. Here's what happened:
The GitHub repository mishap
Open-source code and picture recognition AI models were available on the problematic GitHub repository. The download of these models from an Azure Storage URL was required by users.
Microsoft AI Researchers Expose 38TB of Top Sensitive Data https://t.co/3Sjo68NkaG #infosec #security
— ⓃⓄⓉTruppi (@NotTruppi) September 18, 2023
However, a crucial error in the URL's setting allowed access to the whole storage account rather than only the data that was intended. This configuration error has persisted since 2020, putting sensitive information at risk.
Contents of the exposed data
The 38 terabytes of exposed data contained various sensitive elements. This included personal backups of two Microsoft employees' personal computers, passwords for Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from hundreds of employees. The misconfiguration even allowed for full control access, meaning potential attackers could delete, replace, or inject malicious content.
The SAS token oversight
The exposure didn't result from the storage account itself being directly compromised. Instead, it happened due to the inclusion of an overly permissive Shared Access Signature (SAS) token in the URL. SAS tokens are used to create shareable links, but in this case, they grant more access than intended.
Response & investigation
On 22 June, 2023, Wiz informed Microsoft of its findings. On 24 June, Microsoft quickly took action and revoked the SAS token. Furthermore, on 16 August, they completed their inquiry into the incident's effects on the organisation. Fortunately, according to Microsoft, no customer data was compromised, and no additional internal services were exposed.
Preventing future incidents
Microsoft has increased GitHub's secret spanning service to stop such incidents. This service now keeps an eye on any modifications to publicly available open-source code for the plaintext disclosure of passwords and other secrets, including SAS tokens with too-lax security settings.
In conclusion, while the unintentional disclosure of private information by Microsoft's AI researchers is alarming, immediate corrective action was taken. This event highlights the necessity of robust security procedures while working with enormous volumes of data in the era of AI research.
COMMENTS 0