Claira Stories
Every Document Matters: Why We Review the Full Set, Not a Sample

A Claira philosophy piece on defensibility, audit trails, and why "the machine found it" isn't an answer.
The question we keep getting asked
When we show Claira to new users, one question comes up more than any other: "Can't we just ask the AI to find the relevant documents and skip the rest?"
The short answer is no - not in the way the question usually means. The longer answer is the whole point of this article, and it's worth understanding, because it shapes how we built Claira and how we think eDiscovery review should work.
We take a firm position on this. Once a document has been collected into a review set, every document in that set should be reviewed. Not sampled. Not vector-searched for the likely-relevant ones. Reviewed. Each document evaluated against your review criteria, with a decision recorded, and a justification you could stand behind if asked.
This isn't a technology preference. It's a logic problem, and the logic is simple.
Why collection is the key moment
The culling process exists for a reason. Culling is where you make the bulk, defensible decisions to exclude documents from review - by date range, by custodian, by file type, by domain, by keyword filters, by de-duplication. When done properly, culling reflects deliberate, documented judgments about relevance that can be explained to a judge or opposing counsel.
What remains after culling is, by definition, a set of documents that could be relevant. That's the whole reason they survived. If there was no chance they mattered, culling should have removed them.
So the operative assumption for everything still in the review set is simple: each of these documents might be responsive. We don't know which ones are until we look.
That's the moment where the philosophy question gets real. What do you do with a set of documents you've already decided could be relevant?
The two paths forward
Broadly, there are two approaches.
The first is what we might call probabilistic retrieval - using AI to search, rank, or cluster documents and surface the ones it thinks are most likely to be responsive. Retrieval-augmented generation, vector search, predictive coding, and most "ask your documents" AI tools fall into this family. These are useful tools, and they have a real place in eDiscovery - especially in early case assessment, triage, and investigation. We use similar techniques ourselves when they're the right fit.
But as the primary method for deciding what gets produced and what doesn't, probabilistic retrieval introduces a problem that's hard to solve: the documents that weren't returned don't have a written justification for their exclusion. They were simply ranked lower. If you later need to explain why a particular document wasn't produced, the honest answer is "the model didn't surface it," which is the kind of answer that tends to generate follow-up questions you'd rather not receive.
The second approach is the one we've built Claira around. After culling, every document in the review set is evaluated against your review criteria. A decision is made for each. A justification is written for each. An audit trail exists for each.
That's it. It's not flashy. It's not a breakthrough. It's the same logic that's always underpinned defensible review: if you collected it, and culling didn't remove it, then someone - or something supervised by someone - needs to look at it and make a call.
The black box problem
There's a reason this matters beyond abstract principle. When AI is used to decide what gets reviewed rather than to help review it, the review team loses something important: the ability to defend the decisions after the fact.
Consider the two questions a lawyer needs to be able to answer about any document in a matter:
Why was this document produced?
Why was this document withheld?
In a traditional human review, the answer is in the coding and the reviewer's notes. In a well-structured AI-assisted review, the answer is in the AI's written analysis alongside the human's verification. In both cases, you can point to something concrete.
In a pure probabilistic retrieval workflow, the answer for excluded documents is some variation of "the system ranked it below our threshold." That may be correct, and it may even be reasonable in certain contexts - but it's not a justification, it's a description of a process. When opposing counsel asks why a specific document wasn't produced, you want to be able to say more than "the vector similarity score was low."
This is what we mean when we talk about avoiding a black box. Not that AI shouldn't be involved - AI is enormously useful, and avoiding it in 2026 is its own form of professional negligence on a large matter. We mean that the AI's role should be to document its reasoning on each document, not to silently decide which documents get reasoning applied to them in the first place.
What this looks like in practice
In a Claira workflow, the process is straightforward.
The matter team runs culling in Nuix Discover using the standard tools - date filters, custodian filters, search-term families, de-duplication, near-duplicate suppression, domain exclusions. Everything a reasonable eDiscovery practitioner would do to narrow a collected set down to a defensible review population. The decisions made during culling are themselves documented and defensible, because they're bulk decisions based on objective criteria, not document-by-document judgments.
Once the review set is defined, the team drafts a review plan - a written articulation of what they're looking for, how they'll recognize it, what counts as privileged, and what formats the decisions need to take. This becomes the prompt that drives the AI review. We've written elsewhere about how to approach this - see our prompting overview for the principles, and our case context documentation for how to give Claira the matter-specific background that makes its analysis sharper.
Claira then runs across every document in the review set through a bulk scan. Not the top-ranked subset. Not the clusters most similar to a seed document. Every document. For each one, Claira produces a written analysis: the responsiveness call, a short justification, the passages or metadata it relied on, and any flags it thinks a human should look at.
Those results land directly in Nuix fields. They're sortable, filterable, and searchable. Reviewers can move through them efficiently - focusing their human attention on the edge cases, the borderline calls, and the privilege review - rather than reading every document cold. But the audit trail for every document exists from the moment Claira finishes its pass. Our understanding results guide covers how reviewers actually work through those outputs.
If the matter is ever challenged - if opposing counsel wants to know why a specific document wasn't produced, or if you need to demonstrate the reasonableness of your process under the proportionality rule - the answer isn't "the AI didn't surface it." The answer is the written justification that was produced for that specific document, at the time of review, as part of the record.
Tools to accelerate, not to decide
None of this means the process has to be slow. Tools exist to smooth every part of this workflow. AI summaries accelerate first-pass triage. Thematic clustering helps reviewers spot patterns across the set. Near-duplicate grouping reduces redundant work. Confidence scores help reviewers prioritize which AI calls to spot-check first. These are all legitimate accelerators, and we use all of them.
What we don't do is let any of those accelerators make the decision about whether a document gets evaluated at all. Once it's in the review set, it gets evaluated. The tools change how fast the evaluation happens, not whether it happens.
This is a subtle but important distinction. The Sedona Canada Principles - specifically Principle 7 - explicitly endorse using technology to meet discovery obligations proportionally. Canadian courts have repeatedly affirmed that using AI tools is consistent with a lawyer's duty of reasonable diligence. What the case law has not endorsed is the use of AI to silently exclude documents from consideration without a per-document justification. The proportionality principle is about effort, not about corner-cutting on defensibility.
You can comply with proportionality and still review every document in your review set. In fact, with AI assistance, it's never been more achievable. That's the whole point.
The philosophy, stated plainly
If we had to put our philosophy on a single page, it would read something like this.
Collection is the broadest filter. Culling is where defensible bulk exclusions happen. Everything that survives culling is, by assumption, possibly relevant. Every possibly-relevant document gets reviewed. Every review decision is justified in writing. Tools can make this faster, but they cannot make it optional.
The result is a process that is:
Simple to explain to a judge, a client, or opposing counsel
Consistent across matters, reviewers, and jurisdictions
Defensible at the document level, not just the system level
Compatible with AI as an accelerator rather than a substitute for judgment
We think this is the right way to do eDiscovery review in the age of AI. We've written about the broader philosophy of AI-assisted review in our practical philosophy piece, which covers the evolution from Bates stamping to TAR to modern AI and where we think the field is heading.
The tools will keep getting better. The data volumes will keep getting bigger. The temptation to let the AI decide what matters will keep growing. Our view is that the fundamentals don't change: if you collected it and didn't cull it, you review it. That's how you stay defensible, and defensibility is the product we're all actually selling.