🗓️ Live Webinar November 9: How HealthMatch.io Used Customer.io and RudderStack to Launch Their New Business Model in 24 Hours
Learning Center
Learning Topics
Customer Data
Data Warehouse
- How to Create and Use Business Intelligence with a Data Warehouse
- Data Warehouse Architecture
- Best Practices for Accessing Your Data Warehouse
- What Is a Data Warehouse?
- Data Warehouse Best Practices — preparing your data for peak performance
- How do Data Warehouses Enhance Data Mining?
- Data Warehouses versus Databases: What’s the Difference?
- What are the Benefits of a Data Warehouse?
- Key Concepts of a Data Warehouse
- Data Warehouses versus Data Lakes
- Data Warehouses versus Data Marts
- How to Move Data in Data Warehouses
- Difference Between Big Data and Data Warehouses
Data Security
Subscribe
We'll send you updates from the blog and monthly release notes.
Learning Center
How to Manage Data Retention
As organizations accumulate data they’ve collected, they eventually need to get rid of the parts of the data that are no longer helpful. Deleting older data is mostly needed because it’s not sustainable (and is very expensive) to store terabytes or petabytes of useless data in the long term. However, it’s not always possible to just delete the data — many organizations still need to meet data-related regulatory guidelines, some of which require the data to be available for years after it’s been generated. As the solution to this problem, companies manage data retention through policies. Such data retention policies document the “rules” of how data is stored and, eventually, deleted across the organization.
In this article, you’ll learn more about what data retention is and why a data retention policy is valuable to your organization. You’ll also learn some of the core ideas behind data retention policies and some best practices for creating your own
What Is Data Retention?
Data retention is the practice of storing and managing information for a specified period of time. A data retention policy is an organization’s system of rules for managing the information it generates. This includes how the information is stored, the period for which it is stored, and how it is deleted afterwards. Generally speaking, this policy aims to define limits for the retention of data for compliance or regulatory reasons. The data retention policy of an organization is typically based on the rules of the regulatory body governing its industry.
The objectives of a data retention policy customarily include:
- Improving the speed and efficiency of managing and accessing data
- Reducing costs by cutting down on storage hardware and software needs
- Eliminating potential failure points and vulnerabilities inherent in huge data systems
- Limiting liability and ensuring compliance with industry guidelines and regulations
What Makes a Good Data Retention Policy?
For any organization, the data retention policy is the primary guideline for dealing with all its data. No matter your industry—whether telecommunications, financial services, consultancy, retail, healthcare, hospitality, or government—a great data retention policy will guarantee that your valuable data assets are properly managed to provide a net positive value to your business.
A robust retention policy in any industry must first be sustainable. Creating a sustainable foundation for meeting all the legal, regulatory, and business needs is paramount if your data retention policy is to cope with a rapidly growing data inventory over time. This brings us to the importance of flexibility in a good retention policy. The more comprehensive and flexible your data management policy is, the better it can adapt to swiftly changing regulatory landscapes and business dynamics.
In addition to flexibility and sustainability, the following practices should be considered important and necessary in any robust data retention policy.
Periodic Audits of Data to Evaluate Usefulness
This is a crucial aspect of the very best data retention policies. Many organizations hold on to data longer than required because they feel that keeping as much data as possible in storage is more secure than deleting and then needing it later. However, this is a big misconception that prescheduled periodic data audits could help you avoid.
Keeping data that no longer has any usefulness to your business needs, especially when it is no longer required by law, can have serious ramifications beyond the obvious. For instance, consider the following implications as your data inventory expands:
- Your risk for security breaches increases
- Your data management tools and hardware become increasingly cluttered
- Your financial and management resources are tied up in order to keep maintaining the data
- Your burden of keeping up with regulatory statutes related to the useless data continues to expand
Thus, carrying out periodic audits to determine the usefulness of collected, archived, warehoused, or backed up data is important. During these audits, organizations must identify and consider the exact requirements for their business, the regulations peculiar to their industry and the retention laws in every country that dictate which data to collect and retain, and, of course, how long the data should be kept.
Furthermore, periodic internal audits provide opportunities to check your organization’s adherence to current data retention and compliance policies. Your own policy must never be out of date. Generally speaking, an appropriate frequency to conduct audits and evaluate these considerations may be yearly or even monthly.
Scheduled Retention Periods
Having scheduled data retention periods is also essential to the progression of your business. Your organization must answer two questions:
- For how long will you retain different types of data?
- How frequently will your data bank be updated (daily, weekly, monthly, or annually)?
A good data retention system will provide answers to these two questions. Start by identifying what kind of data your organization collects and classify the data based on how critical it is, whether the data is proprietary, and whether the data currently serves or will serve any future business needs.
Most of your organization’s proprietary data is internally generated intellectual data, ranging from technical to financial information. This could be data that gives you a competitive advantage in your industry and/or data that is protected by copyright laws, patents, etc. A typical example is the source code of software applications released for commercial purposes. Preventing the loss of this data or its exposure to the general public is directly tied to the survival of your company. Thus, it is reasonable to assume that the data should be retained for the lifetime of the business or until it’s determined safe to discard.
Like proprietary data, a company’s critical operational data in a business segment is tied to its success in that business segment. In essence, this data helps the company carry out its functions and obligations to customers. If critical business data is compromised, it could present an existential threat to the business as the financial, legal, and reputational costs compound with time. Your retention period and backup frequency for this type of data are highly dependent on your business needs and legal provisions.
Retention periods are scheduled automatically for these different classes of data depending on your organization’s internal requirements and external factors. A preset frequency to update (ie, back up or delete) the contents of your data bank should be incorporated into your data management system for automatic implementation, helping you avoid potentially costly mistakes that could arise from having human personnel back up and delete data directly.
Awareness of Legal Compliance
Failure to comply with regulations and laws governing data records management in your industry, state, or country could leave your organization open to criminal and/or financial sanctions.
In the last few years, global attention has zoomed in on data privacy in organizations, especially for business-to-consumer (B2C) companies. Since these companies directly interact with customers and are more likely to collect, handle, and use individual data, more stringent data privacy regulations are enforced on them. Note that this does not mean that there are no strict regulations applying to business-to-business (B2B) companies; it simply means B2C companies receive a lot more attention from the general population. As Joe McKendrick explains in Forbes, eventually every company will become a data company. Thus, it’s important for every business to understand and build policies that meet or exceed all applicable data retention laws.
Let’s take a closer look at some common legal requirements and laws for data retention across different industries.
GDPR
The General Data Protection Regulation (GDPR) is widely regarded as the world’s strictest privacy and security law, applying to any organization in any location that targets or collects data relating to European Union citizens. Article 5(e) of GDPR stipulates that:
“Personal data shall be kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed; personal data may be stored for longer periods insofar as the personal data will be processed solely for archiving purposes in the public interest, scientific or historical research purposes, or statistical purposes.”
Harsh fines, in the tens of millions, are levied against businesses that violate the GDPR’s privacy and security standards.
HIPAA
According to the US CDC, the Health Insurance Portability and Accountability Act of 1996 (HIPAA) is “a federal law that required the creation of national standards to protect sensitive patient health information from being disclosed without patients' consent or knowledge.”
The HIPAA Privacy Rule set by the US government provides guidelines to covered entities (healthcare providers, healthcare clearinghouses, and associated businesses), specifying that they are required to retain HIPAA-related data for a minimum period of six years from the date the data was created. Generally, the management of individual medical records is governed by state laws all over the US.
PCI DSS
The Payment Card Industry Data Security Standard (PCI DSS) is a set of operational and technical requirements administered globally for organizations that handle credit cards. The goal is to protect cardholder data in storage and across public networks against attacks.
Organizations are required to restrict access to cardholder data (CHD) and sensitive authentication data (SAD), and such data may not be stored by merchant or payment processors. In situations where storing CHD and SAD is inevitable, PCI requires that encryption, truncation, masking, or hashing be used for safekeeping.
SOX
The Sarbanes-Oxley Act (SOX) applies to financial data reliability and retention for public corporations. If you operate in a publicly listed enterprise, SOX requires your company to carry out an annual audit that provides proof of accurate and secured data. Furthermore, it declares different data retention dates depending on the document type defined by the SEC.
FERPA
The Family Educational Rights and Privacy Act (FERPA) in the US protects the privacy of student educational records within schools and associated institutions. If you work with data in the US academic sector, this is a law you have to be familiar with. Under FERPA, institutions are legally required to hold a student’s data for six years after they are no longer active in the institution.
BSA
Finally, the Bank Secrecy Act (BSA), also known as the Currency and Foreign Transactions Reporting Act, is an anti-money laundering (AML) act in the US that requires financial institutions to keep records of cash purchases, file reports, and report suspicious activities that may signify money laundering, tax evasion, and other criminal activities.
The data retention period is usually up to five years for businesses operating under this act.
Other Retention Policy Requirements
Some other examples of record retention policy law include the Gramm-Leach-Bliley Act (GLBA) for financial institutions, which requires retention of privacy notices forever; Equal Employment Opportunity (EEO) laws, which mandate employers to keep employee records for one year after termination of employment; the Fair Labor Standards Act (FLSA), which mandates employers to retain payroll records for a minimum of three years; and many more.
This is by no means an exhaustive list of the regulatory and legal requirements out there. As a data professional or executive in an organization, it is your responsibility to investigate and integrate the specific legal requirements that may affect your business into your data retention policy.
Automated Policy Compliance
After establishing a data backup policy, the actual implementation should be automated. Manual implementation is largely unacceptable for organizations that deal with huge volumes of data. Besides exposing your records to mistakes, the risk of a security breach is much higher.
You can achieve automated implementation through the collaboration of all stakeholders in the company, especially the IT and legal teams. Since your policy has already created timelines for handling different types of data in line with your business and legal obligations, very little administrative oversight is needed when automated systems carry out data backups and archiving or deleting operations.
Certainly, situations may arise when the IT rules set for automating your data retention policy do not apply. For example, a change in regulation or a new business need may arise that requires your company to retain and track a new type of data; in such situations, your storage system administrators may review the rules to accommodate this new requirement and prevent the loss of crucial data.
Testing Backup Policies
The cost of a data breach or the loss of backup data for an organization could run to the tune of hundreds of thousands of dollars in just a few hours. Experts have cited a figure of $5,600 per minute (over $300,000 every hour). Backups are intended to protect your business in these situations, and it’s incredibly important to test your backup policies as part of your wider data retention policy.
The aim of frequently testing backups is to ensure business data can be retrieved quickly and completely enough for business continuity. Simply put, you have to answer the following questions for your situation:
- What needs to be tested?
- How often do you want to test?
- Is it possible to restore your data?
- Is the recovered data accurate?
- Do your recoveries work consistently?
Results from these tests should be shared by the IT team and acted upon accordingly to further improve the reliability of your backups.
Data Retention and Data Warehouses
Now that you’ve learned about best practices for data retention and some key questions to answer when building your data retention policy, let’s take a look at how data retention affects your data warehouse or data lake.
A data warehouse is usually the largest database owned by a company, and as such, it can be extremely costly to set up, operate, and maintain. Generally, it is expected that a data warehouse should be nonvolatile throughout its lifecycle—that is, all loaded data is expected to remain there permanently, or until decided otherwise.
Due to the high cost and complexities associated with setting up a data warehouse, most organizations do not bother with implementing strategies for maintaining and optimizing their warehouse. Thus, it’s likely that these organizations are storing data that no longer serves their business or legal obligations, further occupying resources that could otherwise be made available for new data, which naturally drives up costs.
It’s for this reason, as well as the others discussed above, that organizations are strongly advised to implement requirements for archiving or deleting data once a given retention period has elapsed.
Ultimately, you have to adopt the very best industry standards when creating a data retention policy for your organization. Most importantly, it is crucial that all stakeholders are involved in the creation process and that the business makes unique considerations based on its business sector and location.
Some Examples of Data Retention Policies
For your review, check out the data retention and privacy policies of a few of the most popular companies in the world:
Conclusion
In this article, you learned that data retention for an organization means storing and managing generated data for a specified period, usually through a system of rules called a data retention policy.
You also learned that data retention best practices involve creating a flexible, explicit, and sustainable policy that gives organizations wiggle room to adjust in a dynamic business and regulatory environment. For an entity’s data retention policy to be described as robust, it has to show awareness of legal ordinances stipulated for the entity’s industry and operational borders. For instance, HIPAA guides organizations in healthcare on data privacy for patients, and PCI DSS establishes similar rules for payment data privacy with organizations that process credit cards.
Lastly, you learned that best practices for implementing a data retention policy include automating the process for data collection, retention, archiving, and deletion, as stipulated by the unique specifications of your policy. A large part of this process takes place in your organization’s data warehouse if you use one.
Get the Data Maturity Guide
Our comprehensive, 80-page Data Maturity Guide will help you build on your existing tools and take the next step on your journey.
Get the GuideBuild a data pipeline in less than 5 minutes
Create an accountSee RudderStack in action
Get a personalized demoCollaborate with our community of data engineers
Join Slack CommunityThis site uses cookies to improve your experience. If you want to learn more about cookies and why we use them, visit our cookie policy. We’ll assume you’re ok with this, but you can opt-out if you wish Cookie Settings.