By: Philip Favro and Gregg Parker, Principal Consultants
Much of the focus of electronic discovery in recent years has centered on preserving and obtaining text messages and workplace collaboration content. And yet, there are other key sources of ESI that are often overlooked and may be more significant for claims and defenses than digital age communications. Structured data is one of those sources.
Found in a variety of different repositories that are generally and broadly characterized as “databases,” structured data presents unique preservation, collection, and production difficulties. Indeed, requests for discovery of structured data requires far more than just a broadly worded Rule 34 request to produce “documents” that might ordinarily suffice for obtaining relevant communications, Microsoft Office materials, PDF files, or other unstructured information.
This article examines key issues surrounding the discovery of relevant structured data in civil litigation. In particular, we provide an overview of structured data that counsel may encounter in discovery. In addition, we explore key issues affecting discovery such as understanding the nature of the database housing the structured data, determining how to obtain ESI from a structured data repository, and seeking a reasonably usable production format for structured data. We conclude by discussing recommended practices—particularly the use of experts—for handling structured data discovery.
What is Structured Data?
Structured data, as the name implies, is data that is stored in a standardized format for ease of access and analysis. Excel spreadsheets and SQL databases are two examples of structured data formats in common use today.
Within structured data sources, schemas—blueprints that describe the structure of a database—are used to organize the data into records and strictly define elements (or fields) by specifying the names, types, and lengths of the fields and their relationship to each other. New data being entered must conform with that defined structure. Applications like Microsoft Excel and Access, and programming languages like Simple Query Language (SQL), are then used to access the data for management and analysis. Structured data may come as prepackaged “off the shelf” database systems, such as online applications like Salesforce and Hubspot. Some of these systems allow users to customize the look and feel of their content when displayed on a screen or exported in a formatted report, but behind the scenes the content is maintained in the structed format established in the schema for that system. Rather than relying on publicly available databases, some organizations may choose to create their own structured data repositories, colloquially referred to as “bespoke” databases.
In contrast, unstructured data does not have a standardized format. It is freeform in nature and can take on many different formats. Images, audio and video media, and text-based data such as emails and articles are a few examples.
Organizations use structured data repositories because they provide easier access and management of information, scalability, indexing for faster searching and filtering, and simplified storage of large amounts of data. Of course, there are disadvantages to structured data systems, including strict limitations on what type of data can be stored in a particular source. If, for example, a user attempts to load data that does not adhere to the schema, the system can become corrupted and the data lost.
What are the Key Discovery Issues with Structured Data?
There are any number of issues that could arise in discovery with structured data. Some of the most important issues include the following.
Understanding the Nature of the Database
Understanding the nature of the repository housing relevant structured data is a key initial issue. It is unlikely a client will be able to formulate proper search queries or have the information produced in a reasonably usable format if counsel cannot grasp the type of database at issue and related questions regarding how data is generated, maintained, and accessed. Two cases, In re Blair, (Tex. App. 2022) and United States v. Holmes, 2021 WL 3395146 (N.D. Cal. Aug. 4, 2021), highlight this issue.
In Blair, requesting parties moved to compel accounting records in native format from responding parties’ relational database. Relational databases link tables together to create logical relationships between related records, and thereby enhance the ability of users to search, filter, and analyze large amounts of data. For example, the relational database in Blair created relationships among associated accounting records, linking together invoices and payment information for ease of access. Responding parties, however, produced the records in Excel, which—while disclosing raw transactional data—disabled the links to the supporting accounting records. Requesting parties argued that the Excel production prevented them from readily accessing the underlying related information at issue. In response, the court agreed and ordered the accounting records produced in native format (i.e., a format that “retains the file structure associated with and defined by the original creating application” such as Microsoft Access. Relying on expert testimony offered by requesting parties, the court explained that responding parties’ Excel production was improper as it “did not link the data in the same way” and thereby made reviewing the information far more burdensome than had it been produced in native format.
Holmes involved a bespoke database run by defendant’s now defunct company and from which the government sought a fully accessible copy of information. While the database required both login credentials and an encryption key to access the data, defendant neither disclosed the encryption key nor informed the government that it needed such a key. These circumstances, together with the subsequent retiring of the database and deletion of the encryption key, made it impossible for the government to access the database information. And while the court concluded that defendant was responsible for the loss of this information, the government ultimately could not use the structured data in its prosecution of defendant.
Both Blair and Holmes are instructive on the need to understand the nature of the database at issue. Because requesting parties in Blair understood the structured data was housed in a relational database, they insisted on a native format production that would ensure easy access to the underlying evidence. Had requesting parties glossed over this issue, they would have wasted countless hours poring over the Excel production and produced records to identify the evidence.
In contrast, the structured data the government obtained in Holmes was ultimately worthless since it was inaccessible. Had the government been aware of the indispensable role of the encryption key, it could have requested the key from defendant and accessed the information prior to the database being decommissioned.
Obtaining ESI from a Structured Data Repository
Another key discovery issue is obtaining usable ESI from a structured data repository. Given the specialized and dynamic nature of structured data and the myriad of different types of databases with varying schemas in which data could be stored, requesting parties should work closely with responding parties to formulate measured queries that target responsive information and are not unduly burdensome. Courts will examine the burdens of structured data discovery and may reject or modify demands that are not proportional to the needs of a particular case.
For example, in Netherlands Ins. Co. v. HP, Inc., 2022 WL 18027562 (D. Mass. Dec. 30, 2022), the court denied plaintiffs’ motion to compel defendant to produce certain litigation records from a structured data repository. The court found that plaintiffs’ request was disproportionate to the needs of the case given that the relevance of the requested records was “unclear at best” and defendant would be substantially burdened if required to identify the records at issue. In particular, the court rejected plaintiffs’ argument that defendant could “easily produce the requested information.” While those records were apparently maintained in a centralized database, the court observed that the effort to retrieve the responsive materials would be “much more than pressing a few buttons to generate a database printout.”
There is nothing simplistic about producing structured data. Just as in Netherlands, broad requests that do not target materially relevant information nor acknowledge the burdens responding parties could face in collecting and producing such information may very well meet with failure. Instead, just as in Blair, requesting parties should seek clearly relevant information, narrowly pursue what is needed to support their claims or defenses, and substantiate discovery requests accordingly.
Getting a reasonably usable format for a structured data production is an essential aspect of database discovery. If the production is in a format that limits a requesting party’s ability to comprehend the information or access related details or records, courts—taking into account burdens and other factors—should consider ordering the production in a more reasonably usable format. In some cases, the structured data may be stored in a standard, commercially available database program that permits exporting the raw data from one system and importing that data easily into another system. In other cases, the data may not be so easily transferred from one system to another or the cost of doing so may be disproportionate to the needs of the case. In certain situations, the only reasonable option may be to agree on queries that can be run or reports generated and produced. Blair is exemplary on this point, where the court was able to rely on expert testimony in ordering a native format production to ensure the requesting parties had ready access to underlying data relevant to their claims.
Recommended Practices for Handling Structured Data Discovery
These discovery issues underscore the need to adopt certain practices for handling structured data discovery. Three practices that can particularly aid parties on the issues include the following.
1. Use Experts. Parties should consider engaging experts to help with structured data discovery. Structured data experts can help fashion reasonable search queries, determine an appropriate export format for production, and educate the court through written or oral expert testimony. Indeed, in Blair it was requesting parties’ structured data expert witness who ultimately convinced the court that the accounting records should be produced in native format.
2. Meet and Confer. Parties should meet and confer on issues regarding the discovery of structured data and involve their respective experts in their discussions. Topics on which the parties may confer include exploring the contents and structure of the databases at issue, discussing the fields and information that are pertinent to the requesting party’s inquiry, reaching agreement on a set of queries to be made for discoverable information, and determining an appropriate production format. The parties may also consider whether the responding party should have an obligation to explain codes, abbreviations, or other information necessary to ensure that the produced information is reasonably usable.
3. Use ESI Protocols. After their meet and confer, the parties should consider memorializing in an ESI protocol the provisions on which they have agreed for handling structured data. Using an agreed-upon protocol could very well ameliorate future disputes over structured data. Even with an ESI protocol addressing structured data, parties should nonetheless be willing to address issues that could arise during discovery since even the most thorough protocols may not address every conceivable issue with structured data discovery.