The EU General Data Protection Regulation (GDPR) went into effect recently and it will have a huge impact on all businesses. Particularly data intensive companies when considering Article 17 the ‘Right to be Forgotten’. As it now stands, companies can add or remove data from their current active directory, but what about their backups?
Just a little background: GDPR was approved by the European Parliament in April 2016 and came into force in May 2018. Unfortunately, a majority of organizations waited until crunch-time to finalize their policies. We have seen many companies including the likes of Google and Facebook getting fined billions of Euros due to a lack of security and inappropriate data collection. As per the new rule, non-compliance can result in a fine of 20 million Euros or 4% of annual global (GLOBAL!) revenue, whichever is highest.
Right to be Forgotten is a step in the right direction from a consumer standpoint. This dictates that a company must delete customer data if a consumer requests them to do so. Only governments, hospitals and journalists are exempt from this rule. In theory, this provides “complete control” to consumers on how their personal and public information is being used. However, this policy is a nightmare for organizations where all backups and recovery take place. For example, tech companies like Facebook do not collect the data on small spreadsheets and delete a row at the consumers’ request. We’re talking billions of data points with multiple attributes. This is all multiplied by the numerous backups a company stores. The practice of multiple backups, especially in the case of a tech companies, is extremely commonplace.
GDPR Vs Backup Challenges:
Backups are usually not structured like an MS Office file or relational databases (RDBMS). In high level terms, it is important to understand that backup products are handed an object with minimal metadata about that object. Backup products do not control the content, or the format of the object and have no knowledge of the data inside it. So, finding a data point and deleting all associated attributes in the backup is not feasible.
When restoring the backup, checksum is a basic way to check data integrity. Deleting data from backup opens risks of corrupting the backup, breaking applications that were expecting data to be present, flouting referential integrity.
Keeping track of deleted data:
When a company is asked to delete a record, it needs to make sure that data stays deleted. The company needs to find a way to delete data for John Doe living at 001 Hollywood Lane, without storing the data of John Doe at 001 Hollywood Lane because doing so would be violating the deletion request.
The right to be forgotten applies in certain situations and an exemption may apply at times but that is still unclear. In addition to the technical issues, companies might face conflicting compliance issues such as keeping historical data for legal cases and audit purposes.
Is there an obvious solution? No! as historical backups can fall out of GDPR compliance. According to AMI’s Global Model, market size for the Backup and Recovery segment is expected to grow at nearly 9% from $9.4 Billion USD in 2017 to $14.5 Billion USD by 2022. How GDPR will affect this market segment remains to be seen but backup vendors should be a part of this process moving forward, because each industry is generating and gathering more data than ever. As technology progresses, maybe finding and deleting a record from a backup will be easier and can become a competitive advantage for these backup companies.
~Ankit Mehta, Associate