Main Menu

Data Science and Statistics

Photo of Data Science and Statistics


In the era of “big data,” the combined use of statistics, data analysis, machine learning, and other methods of data science have become increasingly important tools in litigation. At Bates White, our team of professionals draws on significant experience and expertise to leverage the tools of data science to solve complex problems.

Data management

Our firm has the infrastructure in place to store, manipulate, and analyze large structured and unstructured data using cutting-edge technology. For example, our suite of data management and statistical tools—including Hadoop, YARN, SQL, Python, and R—allows us to process and query billions of records to quickly respond to client needs.

Data visualization

The sheer amount of information available in large data sets can make it challenging to identify meaningful themes. Our team is skilled at distilling patterns and insights from large data sets and presenting the key takeaways in a format that is easy to understand.


Statistical sampling offers a scientifically reliable and cost-effective approach to learning about entire populations from a more manageable subset or samples. Our team has significant experience and expertise designing samples and developing data collection protocols to draw statistically valid conclusions, whether it is in consulting, investigation, arbitration, or litigation settings.

Statistical and empirical analysis

Beyond data management and visualization, we also have extensive experience with many types of statistical, machine learning, and econometric analyses, including analysis of variance, correlation and multiple regression, predictive analytics, and causal inference. Our experts are skilled at working with clients to identify the appropriate combination of data and analytic techniques for a given context, implementing sophisticated analysis of large and complex data sets, and interpreting and communicating results to clients and triers of fact in a way that is impactful and easy to understand. 


  • Residential mortgage-backed securities analyses—Provide statistical analyses to estimate the fraction of mortgage loans in securitized pools that fail to meet originator’s stated guidelines in many different matters. Some of these matters include ResCap, Lehman, FGIC v. Countrywide, Put-back RMBS, Ambac v. EMC, and Mastr Adjustable Rate.
  • Analysis of music streaming data—In an intellectual property matter involving music royalty payments, processed and analyzed multiple years of daily stream counts for tens of millions of music tracks within a matter of weeks.
  • Damages related to loan servicing—Analyzed 16 TB of data, including 46 billion records dealing with millions of borrowers over a decade, on behalf of a federal regulator to evaluate potential damages incurred by borrowers due to issues with the loan servicer.
  • Competitive effects and market definition—In connection with the Department of Justice’s (DOJ) investigation and subsequent lawsuit challenging American Express’s nondiscriminatory provisions, United States of America et al. v. American Express Company et al., accessed and analyzed almost 60 TB of data from supermarkets and drug stores; from American Express, Visa, MasterCard, and Discover; and from the Visa Payment Panel Study.
  • Assessment of price-fixing allegations—Processed, stored, and analyzed billions of transactional sales data records on behalf of a large generic pharmaceutical manufacturer. Developed a customized search tool to synthesize hundreds of thousands of documents.
  • Data conversion process and analysis—Automated a process to create a database from unstructured information on behalf of a hospital in a contractual dispute with a vendor. Wrote algorithms to convert thousands of pages of qualitative information and data contained in PDF documents into a machine-readable database that allowed the team to reliably and accurately quantify the effects of alleged conduct.
  • Matter on behalf of a large provider of in-home healthcare—In a False Claims Act matter, reviewed medical records for hundreds of patients. For each patient, took hundreds of pages of medical records and developed an innovative process to output graphs that illustrated provider care over time. This illustrated patterns that proved to be inconsistent with the allegations.


News & Insights

Jump to Page