This morning, The Hill reported “AI models frequently ‘hallucinate’ on legal queries, study finds.” It turns out that The Register and Bloomberg joined in. Did you know that consumer-facing generative AI tools are incredibly bad at legal work? Or have you also spent the year locked in a bunker?
Seriously, between these stories and Chief Justice Roberts turning his annual report on the federal judiciary into a deep dive on 2013-era artificial intelligence, it’s as if no one bothered to keep up on artificial intelligence at all last year.
ChatGPT and other large language models hallucinate caselaw? No kidding! Lawyers got sanctioned over it. Another got his license suspended. Trump’s former lawyer got caught up in it. Is there anyone left who DIDN’T know that these tools produce shoddy legal work?
In defense of these headlines, they stem from a new study out of Stanford that digs into the prevalence of gen AI legal errors. It’s a useful study, though not because it delivered a bold new revelation, but because it put some quantitative insights to what we all already knew.
That’s valuable information for refining these models. But also, GPT 4 is already here. Llama 3 is supposedly going to be out imminently.
And while consumer-facing products like ChatGPT rack up errors, we’re already almost past the hallucination problem as trusted datasets and legal-minded guardrails take over the space. Legal applications now focus on accuracy — figuring out how to separate holdings from dicta and from cases that the AI knows to be real.
Perhaps I should cut the mainstream media a little slack on this point. Lawyers — one would hope — will confine their work to tools specifically tailored to legal work. The message needs to reach the pro se community and those trying to navigate non-litigation legal questions without counsel. That’s who needs to be alerted to the danger of relying on these bots for legal help.
The Stanford study highlights this:
Second, case law from lower courts, like district courts, is subject to more frequent hallucinations than case law from higher courts like the Supreme Court. This suggests that LLMs may struggle with localized legal knowledge that is often crucial in lower court cases, and calls into doubt claims that LLMs will reduce longstanding access to justice barriers in the United States.
If the theory was that litigants could use bots themselves to solve their access to justice issues, that’s true. But that’s not the fault of the LLM as much as the fact that a tool cheap enough for someone who can’t hire a lawyer is going to be set up to fail. But LLMs married to the right, professionally vetted tool can make pro bono and low bono work a lot less burdensome upon attorneys, taking a different path toward bridging the justice gap.
So maybe it’s all right that we’re getting a few breathless headlines aimed at the public about the risks of ChatGPT lawyering for the sake of the people who really aren’t clued in to how the legal industry has approached AI.
But, seriously, what’s Chief Justice Roberts’s excuse?
Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive [Stanford Human-Centered Artificial Intelligence]
Joe Patrice is a senior editor at Above the Law and co-host of Thinking Like A Lawyer. Feel free to email any tips, questions, or comments. Follow him on Twitter if you’re interested in law, politics, and a healthy dose of college sports news. Joe also serves as a Managing Director at RPN Executive Search.