In 2015, big data grew larger as the world grew smaller, bringing with it exponentially increased risks and challenges—especially during the discovery phase of litigation. New technologies and decreased storage costs meant that amassing data became much easier, but the burdens associated with managing such data, and producing it when required by courts or regulatory bodies around the world, became more complicated. To find some relief, companies have increasingly looked to third parties and to revising their own internal policies. Additional relief may also come through the December 1, 2015, amendments to the Federal Rules of Civil Procedure.
Further complicating things was one of the significant, headline-grabbing events of 2015: the dissolution of the EU-US Safe Harbor Framework. Invalidated by the European Court of Justice in October, this decision will have a significant impact on international companies seeking to transfer data from European Union member nations into the United States. Until a new framework is established and implemented, companies will have to rely on alternative methods to comply with EU data privacy laws.
Big Data Discovery Challenges
Big data presents several challenges for parties involved in litigation given the size, volume and complexity of this information.
- Cost Management. For cases involving lengthy time periods, multiple custodians or various data sources, it is not uncommon for companies to collect a terabyte or more of data (1TB = 1000GB). The costs associated with the collection, review and production of so much data are tremendous—often among the highest litigation-related expenditures. To keep costs under control, companies must find cost-effective methods to sort through massive amounts of data to identify relevant information.
- Structured Data. Structured data refers to information in a defined field within a database record or file. Examples include databases maintained in Microsoft Excel and Access, Oracle’s PeopleSoft and SAP database products. Because databases are typically designed to manage and process corporate information, not to prepare for potential litigation, they can be a challenging data source for litigation-related discovery. For example, databases are updated on a routine basis, complicating efforts to preserve information over the course of litigation. Further, databases often contain data with no relevance to a litigation, making it difficult to extract only relevant information without disclosing irrelevant and potentially confidential information. In addition, structured data may be maintained in a format that cannot be easily produced to an opposing party. Finally, because structured data often includes private information—such as employee and customer records, financial information and health records—companies must be mindful of privacy concerns.
- Social Media. Many businesses now maintain a social media presence, and social media posts made by the company or its employees may be discoverable. Several discovery-related challenges are associated with social media. First, similar to structured data, the format in which social media is maintained may not be reasonably accessible to an opposing party. Second, posts may contain personal information subject to privacy laws. Third, social media accounts are controlled both by the account holder and the social media company, leading to uncertainty over which entity preserves and produces the data.
Strategies and Best Practices for Managing Big Data
Although big data is challenging during discovery, there are several strategies and best practices that can help ensure an efficient collection, review and production of this data.
- Data Source Familiarity. At the outset of discovery, it is important to identify and become familiar with the various sources of potentially relevant information. By working closely with the company’s information technology professionals, counsel may determine cost-effective strategies for data preservation and collection for potential data sources. During the Fed. R. Civ. P. 26(f) conference, counsel should consider avoiding preservation protocols that are impractical for dynamic, structured databases.
- Choosing the Right E-Discovery Vendors and Tools. Counsel should consider working with vendors that offer sophisticated e-discovery software tools, such as early case assessment, concept clustering, or technology-assisted review. When properly employed, these tools can eliminate significant amounts of irrelevant data and reduce document review time.
- Database Reports. It is typically impractical to produce a database in its native environment. As an alternative, consider querying databases for relevant information and providing the opposing party with a customized or standard report. CSV text files are a widely accepted format for moving data between databases. This format is relatively easy to generate, read and analyze. Providing data reports in this format generally satisfies the “reasonably accessible” standard of the Federal Rules.
- Statistical Sampling. When a database contains hundreds of terabytes of data, it can take several days to run a search and generate a report. Statistical sampling provides a random, smaller sample of the dataset, which allows parties to draw conclusions for the entire dataset.
Amendments to the Federal Rules of Civil Procedure took effect on December 1, 2015. They include changes to the discovery rules emphasizing proportionality and cooperation. The changes include:
- Rule 26(b)(1) now limits discovery to information that is relevant to a party’s claim or defense and is “proportional to the needs of the case.” The Advisory committee explained that this rule change is intended to “prompt a dialogue among the parties and, if necessary, the court, concerning the amount of discovery reasonably needed to resolve the case.”
- Rule 26(b)(1) also no longer states that discovery may include information that is “reasonably calculated to lead to the discovery of admissible evidence.” This phrase previously had the potential to widen discovery beyond its proper scope.
- Rule 26(d)(2) permits the parties to serve Rule 34 document requests before the Rule 26(f) discovery planning conference. This change enables parties to address any issues with the document requests at the conference.
- Rule 37(e)(1) sets forth what sanctions a court may impose if electronically stored information is lost because of a party’s failure to “to take reasonable steps to preserve it” and the lost information cannot be “restored or replaced through additional discovery.” Under the amended rule, sanctions are not permitted if evidence is lost despite a party’s reasonable efforts to preserve it. Further, even if a party failed to preserve information, sanctions are not automatic. Under Proposed Rule 37(e)(1), a court may order “curative measures,” but only upon a finding that another party was prejudiced from losing the information.
- Rule 37(e)(2) permits more severe sanctions, such as an adverse inference or the entry of default judgment, but only when the court finds that a party “acted with the intent to deprive another party of the information’s use in the litigation.”
The amendments’ focus on cooperation and proportionality in e-discovery encourages parties to engage each other earlier in the process with the goal of establishing reasonable bounds for discovery. Should the parties fail to do so, the amendments invite the courts to intervene and limit unnecessary discovery requests. Whether the courts will accept the invitation remains to be seen.
Data Transfers After EU-US Safe Harbor Framework Invalidated
In its decision to invalidate the EU-US Safe Harbor Framework, the European Court of Justice (CJEU), concluded that the framework failed to adequately protect the privacy rights of European citizens by allowing US intelligence agencies unfettered access to European citizens' data on American servers.
On February 2, 2016, the European Commission announced that it had reached a high-level agreement on a series of measures with the United States to resolve the issues in the CJEU’s ruling. The new scheme, called “EU-US Privacy Shield” will be administered by the US Department of Commerce. It is anticipated that it will take three months for European and United State Authorities to finalize and put in place the arrangements that have been agreed, meaning that the EU-US Privacy Shield scheme should be implemented by May 2016.