Main Menu

Data Science and Statistics

Photo of Data Science and Statistics


In the era of “big data,” the combined use of statistics, data analysis, machine learning, and other methods of data science have become increasingly important tools in litigation. At Bates White, our team of professionals draws on significant experience and expertise to leverage the tools of data science to solve complex problems.

A dedicated Data Science Committee supports both the Data Science and Statistics Practice and big data and computational projects throughout the firm. The committee is comprised of employees from across practices as well as members of the firm’s Technical Services team. The Committee ensures that best practices are followed for both handling and analyzing data. Committee members promote appropriate technology and data science tools for use throughout the firm.

Data management

Our firm has the infrastructure in place to store, manipulate, and analyze large structured and unstructured data using cutting-edge technology. For example, our suite of data management and statistical tools—including Hadoop, YARN, SQL, Python, and R—allows us to process and query billions of records to quickly respond to client needs.

Data visualization

The sheer amount of information available in large data sets can make it challenging to identify meaningful themes. Our team is skilled at distilling patterns and insights from large data sets and presenting the key takeaways in a format that is easy to understand.


Statistical sampling offers a scientifically reliable and cost-effective approach to learning about entire populations from a more manageable subset or samples. Our team has significant experience and expertise designing samples and developing data collection protocols to draw statistically valid conclusions, whether it is in consulting, investigation, arbitration, or litigation settings.

Statistical and empirical analysis

Beyond data management and visualization, we also have extensive experience with many types of statistical, machine learning, and econometric analyses, including analysis of variance, correlation and multiple regression, predictive analytics, and causal inference. Our experts are skilled at working with clients to identify the appropriate combination of data and analytic techniques for a given context, implementing sophisticated analysis of large and complex data sets, and interpreting and communicating results to clients and triers of fact in a way that is impactful and easy to understand. 

For a full list of our data science–related work, click here


  • Residential mortgage-backed securities analyses—Provide statistical analyses to estimate the fraction of mortgage loans in securitized pools that fail to meet originator’s stated guidelines in many different matters. Some of these matters include ResCap, Lehman, FGIC v. Countrywide, Put-back RMBS, Ambac v. EMC, and Mastr Adjustable Rate.
  • Analysis of music streaming data—In an intellectual property matter involving music royalty payments, processed and analyzed multiple years of daily stream counts for tens of millions of music tracks within a matter of weeks.
  • Damages related to loan servicing—Analyzed 16 TB of data, including 46 billion records dealing with millions of borrowers over a decade, on behalf of a federal regulator to evaluate potential damages incurred by borrowers due to issues with the loan servicer.
  • Competitive effects and market definition—In connection with the Department of Justice’s (DOJ) investigation and subsequent lawsuit challenging American Express’s nondiscriminatory provisions, United States of America et al. v. American Express Company et al., accessed and analyzed almost 60 TB of data from supermarkets and drug stores; from American Express, Visa, MasterCard, and Discover; and from the Visa Payment Panel Study.
  • Big data assessment of price-fixing allegations—Leveraged Hadoop big data technology on behalf of a large generic pharmaceutical manufacturer to process, store, and analyze billions of public and private sales and prescription records to assess pricing and market share patterns over time. Data sources included transactional sales, Symphony Health, IQVIA National Sales Perspective, and IQVIA National Prescription Audit. Developed a customized tool using Solr to search and synthesize millions of unstructured documents and data files, enabling us to quickly identify key documents and information.
  • Data conversion process and analysis help quantify effects of conduct—Automated a process to create a database from unstructured and non-digitized information on behalf of a hospital in a contractual dispute with a vendor. Developed algorithms to convert thousands of pages of qualitative information and data contained in PDF documents into a machine-readable database that allowed the team to reliably and accurately quantify the effects of alleged conduct.
  • Creation of insights from unstructured information—In a False Claims Act matter, reviewed detailed medical records for hundreds of patients. For each patient, took hundreds of pages of non-digitized medical records and developed an automated process to graph provider care over time.

To learn more about some of our practice-specific data-intensive experience, check out our Finance and Life Sciences data science pages.


News & Insights

Jump to Page

We use cookies to optimize the performance of this site and give you the best user experience. By clicking "Accept," you agree to our use of cookies.

Necessary Cookies

Necessary cookies enable core functionality such as security, network management, and accessibility. You may disable these by changing your browser settings, but this may affect how the website functions.

Analytical Cookies

Analytical cookies help us improve our website by collecting and reporting information on its usage. We access and process information from these cookies at an aggregate level.