scorecardresearch

Breaking news: Microsoft AI researchers expose sensitive data in a Github mishap

The content of the exposed data include personal backups of two Microsoft employees' personal computers, passwords for Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from hundreds of employees.

advertisement
Microsoft AI researchers accidentally expose 38 Terabytes of sensitive data emerging tech
Microsoft AI researchers accidentally expose 38 Terabytes of sensitive data
profile
New Delhi, UPDATED: Sep 19, 2023 11:47 IST

Highlights

  • Microsoft AI researchers accidentally publish private data on GitHub
  • The data includes Microsoft employees' personal data, passwords, & internal messages
  • The mishap occurred due to an overly permissive SAS token in the URL

In a startling development, Microsoft's AI research division inadvertently exposed terabytes of sensitive data on GitHub. The incident occurred when they published a storage bucket of open-source training data. Cloud security startup Wiz uncovered this issue while investigating accidental data exposures in cloud-hosted repositories. The exposed data included private keys, passwords, personal backups, and more. Here's what happened:

advertisement

The GitHub repository mishap

Open-source code and picture recognition AI models were available on the problematic GitHub repository. The download of these models from an Azure Storage URL was required by users.

However, a crucial error in the URL's setting allowed access to the whole storage account rather than only the data that was intended. This configuration error has persisted since 2020, putting sensitive information at risk.

Contents of the exposed data

The 38 terabytes of exposed data contained various sensitive elements. This included personal backups of two Microsoft employees' personal computers, passwords for Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from hundreds of employees. The misconfiguration even allowed for full control access, meaning potential attackers could delete, replace, or inject malicious content.

The SAS token oversight

The exposure didn't result from the storage account itself being directly compromised. Instead, it happened due to the inclusion of an overly permissive Shared Access Signature (SAS) token in the URL. SAS tokens are used to create shareable links, but in this case, they grant more access than intended.

Response & investigation

On 22 June, 2023, Wiz informed Microsoft of its findings. On 24 June, Microsoft quickly took action and revoked the SAS token. Furthermore, on 16 August, they completed their inquiry into the incident's effects on the organisation. Fortunately, according to Microsoft, no customer data was compromised, and no additional internal services were exposed.

Preventing future incidents

Microsoft has increased GitHub's secret spanning service to stop such incidents. This service now keeps an eye on any modifications to publicly available open-source code for the plaintext disclosure of passwords and other secrets, including SAS tokens with too-lax security settings.

In conclusion, while the unintentional disclosure of private information by Microsoft's AI researchers is alarming, immediate corrective action was taken. This event highlights the necessity of robust security procedures while working with enormous volumes of data in the era of AI research.

advertisement
Published on: Sep 19, 2023 11:33 ISTPosted by: samira siddiqui, Sep 19, 2023 11:33 IST

COMMENTS 0

Advertisement
Recommended