How I Built ArguLens: AI-Powered Legal Document Analysis
The story behind ArguLens — from a friend's divorce paperwork to an AI product that finds contradictions, reconstructs timelines, and grounds every claim in source documents.
The story behind ArguLens — from a friend's divorce paperwork to an AI product that finds contradictions, reconstructs timelines, and grounds every claim in source documents.
A friend was going through a divorce. Financial disclosures, custody agreements, years of communication records — hundreds of pages, and her lawyer billed $400/hour to read through them. She asked me a simple question: "Can't AI do this?"
I couldn't stop thinking about it. Legal document review is a reading comprehension problem — the kind LLMs are genuinely good at. But the stakes are brutal. Miss a contradiction in a witness statement and someone loses their case. I wanted to build something that could actually help people in that situation.
You upload legal documents — PDFs, Word files, plain text — and get back structured analysis:
The key word there is "grounded." Nothing the system says is unsupported. Every claim has a citation.
I tested GPT-4, Claude, and Gemini extensively. Claude won on three dimensions that matter most for legal work:
The core technical challenge was evidence grounding. Every insight the system produces must trace back to specific text in the uploaded documents. No vague summaries.
The pipeline works in four stages:
I expected the AI layer to be the hard engineering problem. I was wrong — it was PDF parsing.
Legal documents arrive in every format imaginable: scanned images requiring OCR, financial tables with complex multi-column layouts, court filings with headers and footers on every page, documents with handwritten margin notes. I ended up building a multi-stage pipeline — structural extraction first, OCR fallback for scanned pages, then dedicated layout analysis for tables.
Getting this pipeline reliable across the variety of real-world legal documents took longer than building the entire AI analysis layer. If you're building a document-processing product, budget twice as much time for parsing as you think you need.
This is the feature that makes ArguLens genuinely useful, and it was the hardest to build well. Contradiction detection isn't keyword matching — it's semantic reasoning about consistency.
Consider this pair of statements from different documents:
Zero overlapping keywords. But these statements are mutually exclusive — you can't attend a meeting at a downtown office while traveling in Europe.
My approach: extract factual claims from each document, categorize them (temporal, spatial, causal), then use the LLM to evaluate pairs of claims for logical consistency. The model isn't looking for textual similarity; it's reasoning about whether two claims can both be true simultaneously.
I'm building multi-party analysis (cases with 3+ parties), jurisdiction-aware reasoning (incorporating state-specific precedents), and collaborative workspaces so legal teams can annotate the AI's analysis together.
ArguLens is live at argulens.com.