Machine learning, artificial intelligence and other big data analytics tools are delivering business value by producing valuable insights and augmenting human skills in judgment-based functions. This trend is fueled by the exponential growth in data collection and the price performance of data storage and analytics. Technology is driving this growth in ways that were previously only contemplated in the movies and our imagination. Meanwhile, the legal constructs that had governed relationships between contracting parties need to be evaluated and updated to account for the changing landscape brought about by data analytics. One key fact is that big data analytic systems “learn” instead of being programmed, and it is often difficult or even impossible to understand or limit how they use inputs or to know why they arrive at the insights they deliver. Another key fact is that the data and the insights produced may not be protected by intellectual property laws and must therefore be protected in different ways than traditional outputs.
Data analytics is a process of inspecting and analyzing data with the goal of discovering useful information in order to draw conclusions about the information.1 Data analytics is often grouped into four key categories:2
(1) Descriptive: What is happening? Descriptive analytics focuses on describing metrics and measures within a collection of historical data. It is useful for showing patterns that may offer insights into a business. As basic examples, a health care provider may review how many patients were hospitalized in a prior month and/or year; a retailer may produce a regular report of its average weekly sales volume; and an insurer may identify the number of in force policies and/or claims during a prior month and/or year.
(2) Diagnostic: Why is it happening? Diagnostic analytics examines historical data to find out dependencies and to identify root causes of certain results. For example, a health care provider may learn that an increase in patient volumes for the prior month were for cases of the flu, which coincided with an increase in flu cases nationwide; a retailer may learn that an increase in average weekly sales volume coincided with a specific promotion it had implemented; and an insurer may learn that an increase in the number of auto claims during a prior month coincided with an extremely severe period of snowy and icy roads in the region.
(3) Predictive: What is likely to happen? Predictive analytics uses the findings of descriptive and diagnostic analytics to help identify trends and forecast future results. For example, a health care provider may predict the severity of flu cases in its region based on results at the national levels, as well as based on the number of flu vaccinations administered compared to historical trends; a retailer may be able to evaluate and predict the success of a particular promotion based on the historical sales during a previous similar promotion; and an insurer may be able to predict the types and volumes of auto claims that may occur within a region during specific seasons.
(4) Prescriptive: What do I need to do? Prescriptive analytics focuses on what steps should be taken in order to eliminate a potential problem or take advantage of a particular trend. Carrying forward the examples above, a health care provider may order extra flu vaccines based on predictions for a severe flu season at the national level; a retailer may adjust its staffing in order to accommodate an expected increase in sales during a particular promotion; and an insurer may factor in additional environmental risks and costs for certain snow-prone regions as part of its underwriting process.
Companies today are leveraging the power of data analytics to help them translate data into insights that are clear and meaningful and that help them achieve a competitive edge. However, in doing so, companies need to consider the underlying rights and risks associated with this growing technology and information. This article provides recommendations on what to do, and what not do, to reduce legal risks in big data analytics. The risks include inadvertent loss of rights in data, violation of the rights of data providers, legal risks associated with using “black box” results from analytic engines where the law requires an explicable rationale for a decision, overdependence on third-party data analytics providers, and failure to adequately monitor and protect data that has been shared with other parties.
To assist clients with understanding the rights and risks associated with any big data analytics efforts, we have compiled the following list of nine do’s and don’ts to consider:
- Do review data license clauses carefully and understand their potential impacts. For this purpose, think of any agreement where one company accesses the data of another company as a data license, whether styled as such or not. For example, consider an insurance company that has contracted with a third-party administrator (TPA) to process and manage its claims. In such an arrangement, the TPA will require access and use rights of certain policy and claims data from the insurance company in order to process and manage the claims. However, the insurance company should keep the license and right to use such data limited in scope and breadth to the services to be delivered by the TPA. Because data may not be subject to any intellectual property protections, a contract where you provide data to a third party without restrictions may be construed as equivalent to an unlimited license. Outsourcers, cloud providers and other third-party contractors often push to include in their contracts broad express rights to use customer data as well as any data or insights derived from such customer data. It is important to understand and limit those rights to use your data, especially in those instances when you yourself may have limited rights to use such data. In addition, if there is value to be derived from your data (even at an aggregated level), then the business deal should also reflect a sharing of such benefits.
- Don’t expect your digital business team or the data scientists to spot the legal issues in big data analytics. Your digital business team is focused on the business opportunities, and your data scientists are focused on new ways to derive insights. Following the insurance and TPA example above, a TPA (and its data scientists) having access to claims data from multiple insurance customers (including your insurance company) is in a position to extract valuable insights that can then be marketed and sold to the insurance industry. If the agreement between the TPA and the insurance company (more specifically, the data license right) does not restrict or limit such data use, the TPA may be able to take advantage of and benefit from such access and use, even if the insurance company did not intend for its data to be used in such manner.
- Do consider the purpose of the data collection, including uses that may not be imminent at the time the data is gathered, and obtain appropriate consents and licenses. The best chance to obtain an adequately broad consent and license is when you first obtain the data. Following the TPA and insurance company example above, the TPA would likely advocate for a broad data license right so that it can use the aggregated claims data to develop market information analyses and products that it can then sell for a profit. Such purposes may not come up during the initial contract negotiations between the parties, since the parties are likely focused on the in-scope claims processing services; however, since the TPA will have access to a larger pool of data from its insurance customers, it may be better positioned to aggregate data and conduct data analytics as compared to any single insurance company. If the insurance company were to permit the TPA to use its data for this purpose, then the insurance company should make sure that (i) it is able to grant the TPA the right to use its data in such manner (remembering, of course, that the insurance company may itself be subject to restrictions in the licenses under which it obtained such data) and (ii) the business deal adequately compensates the insurance company for the data access it is providing to the TPA.
- Do know where your data is coming from and what rights, licenses and consents you have. A company’s data often comes from multiple sources and is stored in multiple databases spanning the entire enterprise. Due to the volume of such data feeds and data stores, tracking and understanding your rights to the underlying data can become quite complicated. Best practice is to implement a process that tracks and even categorizes the data depending on its sensitivity (e.g., personal information, data subject to HIPAA, sensitive pricing information, etc.), as it is shared within and outside of the organization.
- Don’t exceed those rights, licenses and consents. While this principle is easily stated, it is may be more challenging to implement across a large organization, where many different personnel have access to the various data stores. It is important for a company (and its personnel) to understand where its data is coming from, the rights it has to such data and where the data may ultimately flow. Following the insurance and TPA example above, consider a situation where the insurance company itself only has a limited right to use certain data from its policy holders, but the insurance company inadvertently grants a broad license to the TPA to use and process all of its data.
- Do monitor evolving data laws and regulations, including those relating to privacy, cybersecurity, import/export, eDiscovery and records retention in your industry and geographies (e.g., state specific insurance regulations) and for the types of data that you gather, store or use. Data privacy is an evolving bundle of issues that impacts all types of businesses and industries. A company cannot simply implement “reasonable” steps to be in complete compliance. There are federal, state and international laws, treaties and applicable regulations that need to be reviewed and complied with, depending on the business and industry. For example, insurance companies need to be aware of HIPAA with respect to personal health information, as well as additional cybersecurity requirements imposed by the New York Department of Financial Services (NYDFS) on insurance companies doing business in New York.
- Don’t assume that having a consent, license or absence of regulation means that you can ignore reasonable expectations and potential ethical obligations. Regulations are evolving quickly, and the market may punish perceived abuses. Consider where the laws might go as political sensitivities develop (e.g., as big data analytics enables insurance companies to better understand and identify risk groups for underwriting purposes, consider whether anti-discrimination laws may expand to prohibit denial of coverage based on data points having a disparate impact on certain protected categories).
- Do ensure that you are flowing down to your contractors and other licensees, and that they are flowing down to their subcontractors and sublicensees, any applicable data restrictions. Just as points #4 and 5 above highlight the importance of knowing your rights and obligations with respect to data, it is also important to ensure that those obtaining data directly or indirectly through you are subject to terms consistent with such rights and obligations. In the example with the insurance company and TPA, the rights that the TPA has with respect to claims data from the insurance company may be expressly stated in their contract. However, the insurance company should also require that any data restrictions be flowed down to any subcontractors that the TPA may use to perform its obligations.
- Do document and implement rules, processes, procedures and a strong governance mechanism to govern and secure your data. It is in both the sharing party’s and the receiving party’s interests to implement a strong governance authority that understands the rights to use shared data and helps regulate the use of such data. The sharing party should consider requiring the receiving party to notify and train its employees on the contractual restrictions regarding the use of shared data.
2 https://www.kdnuggets.com/2017/07/4-types-data-analytics.html; http://www.ingrammicroadvisor.com/data-center/four-types-of-big-data-analytics-and-examples-of-their-use; https://www.dezyre.com/article/types-of-analytics-descriptive-predictive-prescriptive-analytics/209; and https://www.scnsoft.com/blog/4-types-of-data-analytics.