Data leaks in AI and policies to minimise the risks

  • 31 Mar 2023
  • 2 Mins Read
  • 〜 by Annette Muindi

A data leak is when information is exposed to unauthorised people due to internal errors. This is often caused by poor data security and sanitisation, outdated systems, or a lack of employee training. Data leaks could lead to identity theft, data breaches, or ransomware installation. Data leaks occur because of an internal problem. 

There have been a few instances where data leaks have occurred in relation to artificial intelligence (AI). In 2020, a security researcher discovered two folders of medical records available for anyone to access on the internet. The data was labeled as “staging data” and hosted by artificial intelligence company Cense AI. Investigators believed the data was made public because Cense AI was temporarily hosting it online before loading it into the company’s management system.

The breach was significant as the medical records were quite detailed and included names, insurance records, medical diagnosis notes, and payment records. The data was sourced from insurance companies and related to car accident claims and referrals for neck and spine injuries. The majority of the personal information is thought to be for individuals located in New York, with a total of 2,594,261 records exposed.

ChatGTP has also experienced a data leak recently. The breach came during a March 20, 2023, outage and exposed payment-related and other personal information of 1.2% of the ChatGPT Plus subscribers who were active during a specific nine-hour window. Officials said they found a bug in an open-source library, which allowed some users to see titles from another active user’s chat history. 

Cutout.pro, an AI-based visual design platform headquartered in Hong Kong, leaked user-generated content via an open Elastic Search instance. According to the team, Cutout.pro exposed customer usernames and images they created using the company’s tools. Moreover, the instance also had information on the number of user credits, where generated images were stored. 

Policies that help prevent data leaks

  1. Have a data protection policy in place- This is an internal policy that outlines an organisation’s approach to safeguarding personal data. It communicates to staff expectations on how they should collect, use, disclose, or otherwise process personal data. In addition, it enables an employer to communicate to staff the consequences of internal non-compliance.
  2. Have a data retention policy- A data retention policy is a set of guidelines that keep track of how long an organisation retains information and how to dispose of the information when it is no longer needed. Information here means both electronic/digital format as well as hard-copy format. In many cases, a retention policy covers all types of information processed within an organisation and does not necessarily confine itself to personal data.
  3. Include a privacy policy- Information Security Policies set out an organisation’s guidelines for detecting, preventing, and managing risks to business information. These risks include the loss, theft, copying, or any other derogation of information integrity. All the information you hold may be at risk of derogation including soft copy, hard copy, or even oral information. Information security risks can originate internally or externally, and could be either malicious or accidental; No matter the case, your organisation needs to anticipate and defend itself against these risks through a detailed policy framework.
  4. Incident response plan- An Incident Response Plan is an Information Security Policy that details the procedures of reporting and responding to suspected, attempted or actual data breaches. Data protection laws have stringent requirements regarding personal data breaches and security incidents including tight reporting timelines.