Claira Stories

Lessons from Zhang v. Chen: How Canadian Lawyers Can Actually Avoid AI Hallucinations in Review

Summarize with AI

In Zhang v. Chen, 2024 BCSC 285, a British Columbia family law file became the first Canadian case to publicly sanction a lawyer for citing authorities that did not exist. Counsel had asked ChatGPT to find supporting decisions for a Notice of Application, dropped the results into the filing, and only learned later, after opposing counsel had spent two hearing days trying to track the cases down, that the tool had made them up. Justice Masuhara declined to award special costs, finding no intent to deceive, but ordered Ms. Ke to pay the costs personally under Rule 16-1(30) of the Supreme Court Civil Rules. The case has since become the cautionary tale every Canadian litigator has heard at least once.

The interesting question is not whether the lawyer made a mistake. She admitted that immediately. The interesting question is whether the failure mode itself, an AI tool inventing plausible-sounding sources, is a property of all AI in legal work, or a property of how that particular tool was being used. Our view is that the second framing is correct, and that practitioners who understand the difference can use AI in review with more confidence, not less.

The structural cause of a hallucination

A general-purpose language model, given a prompt like "find me Canadian cases on the test for relocation under section 16 of the Divorce Act," generates the next most likely sequence of words. There is nothing in its loop that forces those words to correspond to a real document. The model has read enough case law during training to produce names, citations, and reasoning that look right. It will produce them whether or not the cases exist. This is the structural cause. It is not a bug, in the sense that bugs can be patched. It is the consequence of asking a probability machine to answer a factual question without showing it the facts.

The Zhang v. Chen sanction did not happen because ChatGPT was used. It happened because ChatGPT was used as a research tool in a setting where there was no document for it to read. Once you internalize that, you can start to draw the line between the tasks where a chatbot is dangerous and the tasks where AI is, if anything, safer than the human alternative.

Grounded review is a different problem

Reviewing the record in a litigation matter is a different problem than legal research. In review, the document exists. It is sitting in your processing platform, with a known control number, a known family, a known custodian. The question is not "what does the law say about hearsay" but "does this email contain a privileged communication" or "is this attachment responsive to category 3 of the schedule." That question has a real answer, and the answer is in front of the model.

This is the design point Claira was built around. When Claira reviews a document inside Nuix Discover, it does not retrieve a passage from a vector index, summarise it, and call that a finding. It reads the document, runs your coding criterion against the document's actual text, and produces a written justification that quotes from the document itself. If you disagree with the answer, you can read the quote. If the quote is not in the document, the answer is wrong and you can see why. There is no plausible-sounding citation to a case that does not exist, because the only authority Claira is asked to cite is the document already in evidence.

What Case Context adds

The lawyers we work with often ask the same follow-up. If the model is reading the document, what stops it from coming to the wrong conclusion about a document it has not seen in context? This is where matter background matters. Claira lets your team write a Case Context for each file, which travels with every scan and tells the AI what the dispute is, who the parties are, what the issue codes mean in your hand, and what counts as privileged here, on this matter, rather than in some textbook abstraction. The Case Context documentation walks through how to set this up; the operational effect is that the AI's judgement is anchored, not floating.

This is not the same engineering choice every AI vendor in legal makes. Some tools index the corpus into a search layer and pipe top-k passages into a chatbot, which is fast for question answering but reintroduces the Zhang v. Chen failure mode, because the model is once again pattern-matching on retrieved fragments rather than reading the actual document end to end. Our pillar piece on AI-assisted review walks through the law society guidance on this point in more detail. The short version is that the candor and competence duties point in the same direction: you want an AI workflow where every decision is tied back to a passage you can read, in a document you can produce.

Practical guardrails for Canadian teams

Three habits separate teams that have absorbed the Zhang v. Chen lesson from teams that have not. The first is to keep generative AI off any task that requires inventing facts that should already exist. Legal research is the canonical example. If the question is what the law says, use the tools the profession built for that, and use AI to summarise the results after the case names are verified, not to invent the case names in the first place.

The second is to require every AI output that touches a coding decision to quote the source. If the AI tells you a document is privileged, it should tell you which paragraph triggered that call. If the quote is in the document, you have a verifiable answer. If the quote is not, you have a hallucination, and you have caught it before it leaves the platform.

The third is supervision. The Law Society of British Columbia, in the wake of Zhang, was clear that the obligation to verify materials submitted to the court "remains with you." That is not a comment on AI specifically. It is the supervision duty that has always applied to junior associates, contract reviewers, and anyone else whose work flows through your signature. The same standard applies to an AI reviewer.

Where to start

The lawyers we hear from after Zhang v. Chen are not asking whether to use AI. They are asking how to use it in a way that survives a future motion. The answer, in our view, is to insist on document-grounded review, written justifications that cite the record, and matter-specific context that anchors the AI's judgement. If you would like to see what that looks like in practice on one of your files, you can book a working session with us. We will walk through a real document and show you exactly what Claira saw when it made the call.

Hallucinations are not the price of using AI in legal work. They are the price of using the wrong kind of AI for the wrong kind of task.