Claira Stories

How to Include AI Chat History in eDiscovery

Summarize with AI

Generative AI tools have quietly become part of the workday for lawyers, paralegals, executives, and engineers. People draft memos in ChatGPT, brainstorm deal terms in Claude, summarize meetings in Copilot, and ask Gemini to compare legal standards. These conversations often contain the same kind of candor you used to find in email, Slack, and text messages. That means they belong inside the scope of eDiscovery - and teams that are not yet planning for them are going to be surprised when opposing counsel asks for them.

This piece walks through how to handle AI chat history as a discoverable source. It covers what to collect, how to export from the major consumer tools, how to process those exports into Nuix Discover, and how to review them efficiently with Claira. The goal is a workflow that is defensible, proportionate, and repeatable.

Why AI chats are discoverable

The analysis is the same one courts have been applying for years. Relevant, non-privileged information must be preserved and produced, regardless of where it lives. The Sedona Canada Principles and the US Federal Rules of Civil Procedure both frame electronically stored information broadly enough to capture anything typed, generated, or saved in the normal course of business. That includes AI conversations.

What is new is the content mix inside a single session. A single ChatGPT thread might hold a first draft of a contract, a legal research question, a cut-and-paste of a confidential strategy memo, and a personal aside about a colleague. The line between privileged, responsive, and irrelevant can shift from paragraph to paragraph. That is the real reason this source deserves a deliberate workflow rather than a best-effort grab.

The other wrinkle is custody. Some tools keep chats on the user's account and tie retention to the individual's settings. Others push everything to an enterprise workspace. Others still, like local Copilot installations, spread fragments across device caches, Office documents, and M365 audit logs. Identifying the right custodian of the data is as important as identifying the right custodian of the communications inside it.

Collecting from the major tools

The good news is that every major provider now supports some form of export. The bad news is that no two exports look the same, and none of them arrive in a review-ready format. Here is what works today, as of this writing.

ChatGPT offers a self-serve export from Settings - Data controls - Export data. The result is a zip file containing a conversations.json and an HTML rendering. The JSON preserves message order, roles, timestamps, and any image or file attachments by reference. The HTML is easier for a reviewer to read but strips some metadata. For enterprise accounts, admins can trigger account-wide exports through the OpenAI admin console.

Anthropic's Claude provides a similar export under Settings - Privacy - Export data, returning a JSON archive of conversations and a simple HTML view. Enterprise Claude workspaces can be exported by workspace admins, and Projects export as structured bundles with their attached files intact.

Microsoft Copilot is the trickiest of the big four. Consumer Copilot chats export through the Microsoft privacy dashboard. Copilot for Microsoft 365 chats, however, are stored as hidden items in the user's Exchange mailbox and surfaced through eDiscovery Premium in Purview. If you have a matter involving Copilot, assume that collection will go through Purview rather than a user-facing export.

Google Gemini chats are available in the user's Google Takeout archive, either as JSON or HTML, and enterprise Gemini conversations flow through Google Vault. Plan collection the same way you plan Gmail collection on that tenant.

Whatever the source, land the raw export in your usual forensic staging area, hash it, and document the chain of custody. The export itself is the best evidence you are going to get. Resist the urge to copy-paste conversations out of a browser - that breaks timestamps and attachments and will not hold up if the authenticity of a single message is ever challenged.

Processing into Nuix Discover

Once you have the archives, you need to turn them into documents a reviewer can work with. Two formats move cleanly through Nuix Discover: PDFs of each conversation, generated from the HTML view, and natives of the JSON with an accompanying parent-child relationship back to any referenced attachments. Most teams we work with produce both, treating the PDF as the reviewer-facing record and the JSON as the authoritative source.

Map the metadata as you ingest. At a minimum you want the custodian, the tool name, the conversation title, the date and time of each message, the message role (user or assistant), and any attachment identifiers. If your processing engine does not parse these fields natively, script a small pre-processor to inject them into the load file. This pays for itself the first time someone asks for every conversation that referenced a specific project name between two dates.

Treat attachments inside conversations as you would treat attachments in email. Extract them, hash them, and link them back to the parent message. Many chat tools let users paste images, spreadsheets, or full PDFs into the conversation, and those attachments often contain the most responsive content in the entire archive.

Reviewing efficiently with Claira

Once the chats are inside Nuix Discover, this is where volume becomes the real problem. A single active ChatGPT user can easily generate thousands of messages across hundreds of conversations in a year. Traditional linear review breaks at that scale, and keyword search is a blunt instrument for a source where the same sentence can be a strategy disclosure, a harmless brainstorm, or a draft of produced work product, depending on what came before it in the thread.

This is the workflow we designed Claira for. Drop the conversations into a case, add your matter-specific context through Case Context so the model understands what is at stake, and run a bulk scan across every message or every conversation-level summary. You can ask Claira to flag any thread that discusses the subject matter of the litigation, to identify messages that look privileged, to extract every mention of named individuals, and to summarize each conversation in two sentences for a reviewer's first pass. The results come back as structured fields inside Nuix, ready for a human reviewer to verify.

This is the same pragmatic philosophy we wrote about in our piece on future-proofing eDiscovery with AI-assisted review: use AI for the parts of the work where consistency and speed matter most, and keep humans in the loop for the judgment calls. AI chat history is a near-perfect fit for that division of labour, because it is high-volume, patterned, and full of the exact kinds of cues a well-prompted model can spot.

A defensible baseline

If your team is building its first runbook for AI chat discovery, aim for three things. Identify every major AI tool your custodians use, not just the sanctioned ones. Export in the native format each tool offers, never scraped from a browser. Process those exports into Nuix Discover with proper metadata and attachment handling, then review them with a tool that was designed for the scale and nuance this data demands. The technology is changing quickly, but the obligation to preserve, collect, and produce is not. A thoughtful workflow today is a lot cheaper than a motion to compel tomorrow.