Histoires de Claira

AI eDiscovery Review for Cases Over 100,000 Documents

Résumer avec l'IA

There is a quiet assumption in a lot of the modern AI-for-legal coverage that the document review problem has already been solved. Demos look fast. Single-document chat experiences feel magical. The pitch decks talk about agents and orchestration and reasoning, and the screenshots usually feature a tidy folder with a few dozen contracts in it.

If your matters look like that, the current crop of general-purpose legal AI tools will probably serve you fine. But if you run real eDiscovery, your matters do not look like that. They look like a custodian list with twenty names on it, a collection that pulled half a terabyte out of mailboxes and shared drives, and a review population that, after culling, still contains six or seven hundred thousand documents you need to make defensible decisions about.

That gap between the demo and the discovery is the whole reason we built Claira the way we did. This post is about what changes when you stop designing for review sets of one thousand documents and start designing for review sets of one million.

The platforms most lawyers are seeing are not built for this

There are two broad families of AI tool in the market right now that get described as "legal AI" but solve very different problems.

The first family is the lawyer copilot. Harvey, Hebbia, CoCounsel, Spellbook, and several others sit in this category. They are excellent at the kind of work a senior associate does in a Word window: drafting, summarizing a handful of documents, answering questions about a contract, building memos. The interaction model is conversational, the corpus is small, and the workflow assumes a human-in-the-loop reading almost everything that goes through the model. We have written before about how to translate prompts from those tools into Claira, because the two worlds use AI differently.

The second family is the eDiscovery review platform. It needs to do something fundamentally different. It needs to evaluate every document in a culled review set against your review criteria, produce a written justification for each call, surface the borderline cases to a human, and do it within a budget and a timeline that the matter can actually absorb. The corpus is large, the interaction model is bulk, and the workflow assumes the AI does the first pass on documents the human never reads cold.

A copilot can be retrofitted into the second job, but the seams show quickly. Latency stops being a curiosity and starts being a budget line. Costs that were rounding errors at one hundred documents become quotes the client will not approve at one hundred thousand. Audit trails that worked one chat at a time start to require their own infrastructure. And the whole shape of the deployment changes, because reviewers no longer want a chat window - they want every document coded inside the review platform they already use.

What scale actually demands

When the unit of work is not a document but a review population, a few requirements stop being optional.

The first is throughput at a predictable cost per document. Running a vector search across a hundred thousand documents to surface a handful is one model invocation. Running a written, justified responsiveness call across a hundred thousand documents is at least a hundred thousand invocations. The platform either prices that workload at something a client will pay or it does not get used at that scale.

The second is integration with the platform your team is already in. eDiscovery review is a team sport, and the team is not going to migrate to a new interface for every matter. Coding decisions, redactions, privilege calls, and production builds happen inside Nuix Discover or whatever platform the team is licensed on. The AI needs to write its outputs into the same fields the reviewers are already working with, not into a separate dashboard that someone has to reconcile later. Claira installs inside Nuix Discover and writes results directly into native fields for exactly this reason.

The third is matter-specific context. A generic responsiveness call across a hundred custodians, three product lines, and four years of communication is not useful, because the model has no idea what a relevant document looks like for this matter. The platform needs a structured way to carry the case theory, the key players, the dates that matter, and the categories the team has decided to code for. Claira calls this Case Context, and it is the difference between a model that produces plausible-sounding analysis and a model that produces analysis your senior reviewers agree with.

The fourth, and the one that is hardest to retrofit, is defensibility. Every document in the review set gets a written justification, not a similarity score. We have made the philosophical argument for this approach elsewhere. The practical point is that platforms designed for small-corpus chat interactions did not need a defensibility layer, so they did not build one. Building it in afterwards is hard.

Why the cost story matters more than the demo

In a smaller matter, the cost of AI is a curiosity. In a large matter, it is the proposal. Litigation support managers and partners deciding which AI to deploy on a hundred-thousand-document review are doing a per-document arithmetic exercise, often in their head, sometimes in a spreadsheet. The question is not whether the model is impressive. The question is whether it is impressive at a token economics that lets you charge the client and still come in under the linear-review baseline.

Claira is engineered around that question. The bulk scan is designed for high-volume, written-justification workloads with a token profile that holds up across hundreds of thousands of documents. Case Context is reused across every scan in a matter, which keeps the per-document input cost flat as the case grows. Outputs are structured, so reviewers spend their time on the borderline calls rather than on reading every analysis from scratch.

The result is a platform that gets cheaper per document as the matter gets bigger, which is the opposite of what most teams have come to expect from AI.

What to ask before you commit to a platform

If you are evaluating an AI review tool for a large matter, three questions usually surface whether the tool is actually built for the work.

The first is what happens when you run the same prompt across a hundred thousand documents. If the answer involves a queue, a custom API integration, or a per-seat pricing model that assumes one reviewer doing one document at a time, the tool is a copilot in disguise.

The second is what gets written to the review platform when the scan finishes. If the answer is a dashboard, a CSV export, or a separate UI, the tool is going to add work to your reviewers' day rather than remove it.

The third is what the audit trail looks like for a document the model decided was non-responsive. If the answer is a confidence score, you are looking at a probabilistic retrieval system. If the answer is a written analysis explaining why the document does not match the review criteria, you are looking at a platform that takes defensibility seriously.

These are not gotcha questions. They are the questions that separate the tools you can demo from the tools you can run a discovery on.

If you are sizing up a matter that pushes past the hundred-thousand-document line and want to see what an AI review built for that scale actually looks like, book a fifteen-minute walkthrough. We will run Claira against a representative slice of your data and show you what the per-document economics look like for your specific corpus. That is usually a more useful conversation than another slide deck.