The Law & Corpus Linguistics Revolution: Why It Matters

In corpus linguistics by James Heilpern2 Comments

Law and corpus linguistics is an emerging legal discipline that seeks to harness the power of big data to produce empirical evidence about the meaning of legal instruments. Using large, electronically searchable databases of English texts―databases that are publicly available and designed by linguists―alongside big data analytical tools, a judge or attorney can produce empirical evidence about the meaning of language to help answer fundamental questions of legal interpretation. Georgetown University professor Lawrence Solum has predicted that corpus linguistics will “revolutionize” legal interpretation.  Using corpus linguistics, savvy practitioners can produce compelling legal arguments that will turn heads in judicial chambers. Here are the top five reasons you should—in the words of Sixth Circuit Judge Amul Thapar—“add corpus linguistics to [your] judicial toolkit.”

  1. Judges want it: The Bench already understands the value of corpus linguistics. In just the last two years, judges and justices on the U.S. Supreme Court, U.S. Courts of Appeal for the Third, Fourth, Sixth, Eleventh, and D.C. Circuits, federal district courts around the country, the U.S. Court of Federal Claims, and the Supreme Courts of Idaho, Michigan, Montana, and Utah have authored landmark opinions using these new tools. State Supreme Courts are beginning to put pressure on state SGs to get up to speed on corpus linguistics and the Sixth Circuit has even begun ordering parties to provide corpus-based arguments in supplemental briefing.
  2.  Often leads to speedy settlements: The use of corpus linguistic expert witnesses is increasingly commonplace in certain areas such as trademark litigation and contract invalidation cases. The evidence presented is often irrefutable and leads to quick, favorable settlements.
  3. Can help you frame your case: Leading legal scholars in the field (including myself) have noted that the use of corpus linguistics is forcing judges to confront glaring holes in modern legal theory such as the lack of uniform definitions for common place terms such as “ordinary meaning” and “ambiguity.” So long as those deficiencies exist, corpus linguistics can provide flexibility to practitioners in framing their normative case.
  4. Wide Application: Corpus linguistics is helpful in answering a wide variety of legal questions including:
    • What is the “ordinary meaning” of a statutory term?
    • What is the “original public meaning” of a constitutional provision?
    • What is the “plain meaning” of a contract?
    • Is a particular statutory term a “technical or legal term of art”?
    • Has a trademark become “genericized”?
    • Was a trademark “generic ab initio”?
    • How would a “Person Having Ordinary Skill In the Art” interpret construe a patent claim?
    • In what circumstances should the semantic canons of construction apply?
  5. It’s only going to become more commonplace: All signs indicate that the use of corpus linguistics is only going to accelerate. Judges around the country are clamoring to be trained. Law professors are incorporating corpus linguistics into their curriculum at schools like Harvard, Yale, Chicago, BYU, Louisville, and Georgia State. It is an increasingly hot topic at legal conferences and in legal scholarship. This isn’t a fad. It’s here to stay.


  1. When you say “Judges want it: The Bench already understands the value of corpus linguistics,” it really feels like you are significantly overstating the judiciary’s acceptance of corpus linguistics (as a distinct and broader version of traditional textualism). The DC Circuit opinion you cite, for example, seems to reject the conclusion that the district court tried to draw from the corpus and finds no useful conclusion. The 11th Circuit dissent just says that sometimes you need to look beyond dictionaries. It seems that there is a small number of academics aggressively pushing this theory, and a small number of judges accepting it, but it’s pretty far from gaining widespread adoption among the judiciary at this stage.

    1. Hi Sheik:

      Thanks for the comment. I think it’s fair to say that corpus linguistics has not become common place in judicial opinions yet the same way that, say, citations to dictionaries have. But that’s not what I was saying. The trend is positive. There is certainly a vanguard of judges that are out evangelizing corpus linguistics as you suggest: Justice Lee on the Utah Supreme Court, Chief Justice Bevan on the Idaho Supreme Court, Justice DeWine on the Ohio Supreme Court, Judges Thapar, Readler, and Bush on the Sixth Circuit, etc. just to name a few. But my point is that the judiciary is far more familiar and interested in corpus linguistics than the bar is. How do I know this? As president of the Judicial Education Institute—a non-profit that provides judicial education courses to courts nationwide—I spent the eighteen months prior to the pandemic touring the country putting on courses about corpus linguistics. I found judges of all ideological stripes—not just textualists as you suggest—hungry to get up to speed on this new methodology. I know from experience that there are more judges who are fans of corpus linguistics than have used them in opinions thus far.

      There’s an active debate among the judiciary right now around the propriety of a judge using corpus linguistics sua sponte—but even judges like Judge Stranch on the Sixth Circuit who oppose judges using this on their own have indicated that they would welcome *parties* submitting corpus-based arguments. Many judges, right now, are trying to figure out how to signal to the Bar that they want these arguments presented.

      You can also see this in some of the opinions that have cited corpus-based scholarship or amicus briefs to support their points, like Justice Ginsburg did during oral arguments in FCC v. AT&T or Justice Alito did in his dissenting opinion in Bostock last term.

      One final point, it’s important to recognize that corpus linguistics does not guarantee conservative outcomes or pre-ordain outcomes. It’s a-theoretical. A careful look at the DC Circuit case highlights this. Far from disregarding corpus linguistics as irrelevant, the Judges did their own search of a separate database and, because of that, came to a different conclusion with respect to the Chevron question. It makes me want to go back and look at the briefing to see if the ProQuest search was suggested by counsel. Regardless, it highlights my third point: corpus linguistics is forcing our legal theory to evolve as the qualitative and quantitative data made possible by corpus linguistics highlights our current theoretical deficiencies. I can attest that this is something judges are very interested in, and causing warm discussions and disagreements within the judiciary.

Leave a Comment