A medium-sized company is a defendant in a putative class action lawsuit. Outside counsel negotiated the scope of the plaintiffs’ document requests, agreed on a list of custodians and engaged an e-discovery vendor to aid the company in collecting email and other custodial documents. The vendor ends up collecting multiple terabytes of data. The defendant’s general counsel would like advice on how to manage the costs associated with this large amount of data, including utilizing various data analytics in the document review and using skilled, experienced people who understand how to deploy these tools as part of a defensible process.


Key goals for companies that must respond to broad discovery requests include identifying the relevant materials and getting those documents and data produced to plaintiffs in an efficient and defensible manner. When faced with large volumes of data, there are strategies that a producing party can employ to reduce massive volumes of data to a more manageable level. One such strategy is to use early case assessment (ECA) tools and review workflows to prioritize documents for review and production. A benefit of prioritizing a document review is that the most relevant documents are typically reviewed early in the process, which allows for earlier strategy development.

Choose Your E-Discovery Partner Well

When contemplating the use of advanced analytics for filtering large data volumes, hiring the appropriate e-discovery vendor is an important first step. Counsel should consider working with a qualified e-discovery vendor that:

  • Can perform forensically sound data collections, process data using defensible workflows and prepare supporting documentation;
  • Can make available ECA tools for filtering, searching and developing review strategies;
  • Will host the documents in a review application that facilitates further analysis; and
  • Has the experience and the resources to support the case team in the use of ECA and technology-assisted review (TAR).

In cases with a large volume of data, recurring “hosting charges” can become a burden, especially during a long-running case. Vendors typically charge a per-gigabyte fee for hosting data. The use of various ECA tools can result in additional hosting charges. Counsel may want to explore negotiating alternative fee arrangements for processing and/or hosting at the outset.

Simple Steps to Take Before Responsiveness Review

In addition to prioritizing the review to move relevant documents to the front of the queue, some case teams may consider culling data that is unlikely to be responsive or relevant. The most common step is to “DeNIST” data during the initial processing in order to remove particular file types, primarily program or system files. Next, a case team may consider targeted searches to identify non-relevant files, often in the form of music, videos and photos. Similarly, “junk” email, such as daily newspaper reports and newsletters, might be culled prior to the application of search terms in order to minimize instances of “false positive hits.” By excluding these files, the case team might gain greater insight into the data while reducing the volume of data promoted to lawyer review.

Another way to cull irrelevant material is through the use of date restrictions. By applying date filters, which often are agreed on as part of the meet-and-confer dialogue with the requesting party, the case team can concentrate on a date-restricted set of documents for review and analysis. The case team might also consider custodian-specific time limitations. For example, if a custodian only worked in the relevant department for two months, there may be no reason to include email from that person’s entire tenure at the company. This initial cut can be performed during processing and excluded from the reviewable data.

Once broad cuts are made, the next step is typically to run search terms against the remaining data. Creating a list of search terms is an iterative process that is often developed through a process of testing of the terms to ensure that they find the documents you want to find without hitting on too many documents that you do not want to find. The search term hit reports and sampling may suggest modifications of certain terms in order to identify relevant documents in addition to minimizing the amount of false hits.

Consider the Use of Data Analytics Tools

Technology can also make the review process more efficient and can get lawyers’ eyes on the key documents faster. For instance, “concept clustering” uses software to group emails about certain themes. Email threading can reduce review volume by showing reviewers only the most “complete” email in a long chain and automatically coding its subsidiary parts so they do not need to be individually reviewed.

The case team might consider the use of predictive coding or TAR tools during discovery and trial preparation. Although these tools were initially developed and marketed as a means for reducing the first-level lawyer review costs, they also aid in the identification and evaluation of documents produced and received in discovery. In addition, data analytics tools can be considered for prioritizing the review workflow; streamlining the second-level review, which is typically performed by outside counsel; and quality checking the review in order to prepare the documents for production. Data analytics can also save time, and potentially provide better results, during the preparation of witness files for depositions and trial.

Document, Document, Document

Whatever choices are made for data review, it is important to carefully document them. Case teams should be encouraged to work closely with their e-discovery provider to create supporting documentation that describes the process should it ever need to be explained or defended in the event of a challenge. Such documentation can also be used to replicate the process in future litigation.