By: Wayland Radin, Director of Operations, Analytics | Innovative Discovery
Leverage Machine Learning to Handle Expanding Volume
Exponentially expanding data volumes present one of the most significant challenges to companies today, especially when those companies are seeking to fulfill their discovery obligations in a litigation or investigatory matter. Technology Assisted Review, or TAR, promises to help companies (and their counsel) assess their data and meet their obligations in the face of ballooning data volumes by leveraging machine learning algorithms to identify relevant material faster.
Reluctance to Adopt
TAR’s adoption has not been as fast or as widespread as its utility and accuracy would generally imply, a scenario which is largely attributable to lawyers’ (particularly inhouse counsels’) adversity to risk. While tried and true (think Netflix recommendations, Spotify predictions, fraud detection etc.) the use of predictive technology in the legal space is still comparatively new which gives pause to some litigators. Furthermore, while some aspects of TAR have been opined on and even decided on by the courts (more on that below) until now they have been largely silent on one of the more frequently used methods for conducting a review informed by TAR and the corresponding method for validating the performance of this modern TAR process.
Courts have, as mentioned, often held (and even more frequently remarked) that each party in a matter is in the best position to understand their data and determine, accordingly, the most appropriate method to meet their discovery obligations. Notably, this extends to the use of TAR as it has been traditionally understood, which has included iterative but distinct “training rounds” followed by “Quality Control rounds” and then validation via comparison of the algorithm’s predictions on a randomly selected subset of data also reviewed by subject matter experts (the “Control Set.”)
TAR 1.0 vs TAR 2.0
Sometimes referred to as TAR 1.0, the TAR process described above makes a clear distinction between the “training” phase of review and the subsequent “review” phase and requires a significant time investment by subject matter experts to conduct the training and validation. Improvements in the machine learning models underlying modern TAR tools have enabled their users to do away with the distinction between “training” and “review” and instead actively train the algorithm in a near-continuous feedback loop whereby human-coded documents are used to train the model, which can then push the next most likely to be responsive documents to the front of the ongoing human review and learn from that human coding as well. Perhaps predictably termed Active Learning or Continuous Active Learning, this process is also referred as TAR 2.0 and allows responding parties to essentially have their cake and eat it too by achieving the efficiency gains associated with a TAR workflow (reduction in number of documents to review) while preserving the flexibility to conduct review in whatever fashion best fits the data and their tolerance for risk.
TAR 2.0 Flexibility and Efficiency
It is the flexibility afforded by TAR 2.0 that has led to its increased adoption as lawyers’ can proceed as though they are going to review the entirety of a dataset until it becomes clear that there are (essentially) no relevant documents left, allowing them to satisfy their instinct to avoid risk and still maximize efficiency. When that time comes, it does not mean that the review must stop. However, should the party decide to stop review based on the accuracy of the algorithm’s predictions when compared to the decisions made by the human reviewers, such a decision can satisfy the reasonableness standard with the appropriate validation checks to ensure defensibility.
Clear and Important Differences
A solid understanding of Machine Learning, Continuous Active Learning, and the concept of human interaction to essentially “train the machine” is key to leveraging modern technology for document review. To further illustrate the differences in TAR 1.0 and 2.0, here is a comparison chart that is easy to scan and digest.
|TAR 1.0 and TAR 2.0 Comparison Chart|
|TAR 1.0 – Training Up Front||TAR 2.0 – Continuous Learning|
|Requires distinct training and QC rounds prior to full-scale review.||Generally, no separation between training and review.|
|SME reviews and codes a random sample to determine richness, which can also be used as the initial training set.||SME/review team reviews and codes a random sample to determine richness, which can also be used as the initial training set.|
|Training and testing continue until target Recall is achieved.||Begin review based on the categorized results, prioritizing more-likely-to-be-relevant documents.|
|SME reviews and codes the Control Set (which will never be included in the training set). Depending on the Control Set results, the SME continues training to improve the performance of the algorithm.||Human-coded documents are continuously fed to the algorithm to continue its training.|
|Once the target Recall is achieved, the review team will review documents higher than the cutoff point used to identify presumptively relevant documents. Documents below the cutoff point will not be reviewed (though may be subject to Elusion Testing).||Assess coding results and validation results to decide whether to cutoff review at a certain point.|
|One bite at the apple.||Preserves flexibility.|
|SMEs required.||Easily accommodates rolling data loads.|
For further information about Technology Assisted Review and Managed Document Review with Innovative Discovery, please click here.