Main Menu

Data extraction and document search

Photo of Data extraction and document search

Related areas:


Using innovative techniques to generate data sets that are key to evaluating relevant issues

Data extraction is the process of collecting or retrieving various types of data from different sources, which may be semi-structured, unstructured, or disorganized, and transforming, combining, or moving data into a location and format that conforms to the analyses being conducted. Bates White excels in this process and takes extra care when extracting data to maintain the quality needed for use in litigation, government investigations, and consulting matters. Our data extraction tools allow us to systematically analyze sources of data that were previously unthinkable or overly burdensome to generate. 

At Bates White, we are experts in 

  • Using tools to extract data from hundreds to millions of documents (Microsoft Word documents, PDFs, images, etc.), resulting in a faster and more cost-effective way to generate data to conduct analyses
  • Collecting and extracting data from the internet through application programming interfaces (APIs). This extraction generates important databases for our analyses
  • Performing keyword, phrase, or more advanced searches over the entire set of documents. Our teams can identify documents relating to topics of interest within seconds. Our search capabilities allow us to identify relevant documents regardless of file type or whether there are hundreds or hundreds of thousands of documents to sort through.

Case highlights 

  • Data extraction to assess at-issue conduct—Bates White leveraged API to extract historical price information for products sold on Amazon and eBay in a case involving allegations that Amazon’s most favored nation provision prevented third-party sellers on Amazon from offering their products at lower prices on other platforms. Using an advanced methodology that leveraged cloud-based tools and Python, the team compiled the data necessary to evaluate the relevant conduct. These data enabled Bates White’s expert to conduct an analysis of the extent to which eBay prices were lower than Amazon prices and to respond to the opposing expert’s claim, which was supported by a handful of examples using more systematic empirical evidence.  
  • Data conversion process and analysis help quantify effects of conduct—Bates White automated a process to create a database from unstructured and non-digitized information on behalf of a hospital in a contractual dispute with a vendor. We developed algorithms to convert thousands of pages of qualitative information and data contained in PDF documents into a machine-readable database that allowed the team to quantify the effects of alleged conduct reliably and accurately.  
  • Creation of insights from unstructured information—In a False Claims Act matter, reviewed detailed medical records for hundreds of patients. For each patient, took hundreds of pages of non-digitized medical records and developed an automated process to graph provider care over time.
  • Big data assessment of price-fixing allegations—Leveraged Hadoop big data technology on behalf of a large generic pharmaceutical manufacturer to process, store, and analyze billions of public and private sales and prescription records to assess pricing and market share patterns over time. Data sources included transactional sales, Symphony Health, IQVIA National Sales Perspective, and IQVIA National Prescription Audit. Developed a customized tool using Solr to search and synthesize hundreds of thousands of unstructured documents and data files, enabling us to quickly identify key documents and information. 

Tools and technologies

Learn more about the tools we use to collect and extract data and search documents here.

Jump to Page

We use cookies to optimize the performance of this site and give you the best user experience. By clicking "Accept," you agree to our use of cookies.

Necessary Cookies

Necessary cookies enable core functionality such as security, network management, and accessibility. You may disable these by changing your browser settings, but this may affect how the website functions.

Analytical Cookies

Analytical cookies help us improve our website by collecting and reporting information on its usage. We access and process information from these cookies at an aggregate level.