EDRM Reference Metrics & Digital Evidence Standards

This page provides practical reference metrics aligned to the Electronic Discovery Reference Model (EDRM). It is designed for attorneys, paralegals, and litigation support teams who need real-world planning data for digital evidence, email collections, and eDiscovery production.

These benchmarks help answer:

How much data should I expect?
How long will it take?
What scale of production am I dealing with?
What standards apply at each stage?

Important: These figures represent industry-standard planning estimates used in litigation support and eDiscovery operations. Actual volumes and timelines vary based on case scope, data types, custodians, and technical environments. This page is provided for operational reference only and does not constitute legal advice.

EDRM Framework Overview

The Electronic Discovery Reference Model defines the lifecycle of digital evidence in litigation. This page provides reference metrics and operational standards for the stages most relevant to volume, cost, defensibility, and trial production.

Identification

Preservation

Collection

Processing

Review

Analysis

Production

Presentation

Identification & Preservation Reference Metrics

Identification and preservation establish the scope of data subject to legal hold. These metrics help estimate the complexity and breadth of ESI collection efforts.

Metric	Typical Range	Notes
Custodian count per case	1 – 50+	Employees, executives, vendors, and third parties
Email accounts per custodian	1 – 4	Includes work, personal, and legacy accounts
Cloud platforms per case	2 – 10	Microsoft 365, Google Workspace, iCloud, Slack, Dropbox, etc.
Preservation window (time span)	3 – 10+ years	Litigation holds often span multiple matters and employment changes
Metadata fields preserved	100 – 300+	Sender, recipient, timestamps, IPs, file hashes, system properties
Mobile devices per custodian	1 – 3	Work phone, personal phone, tablets

Preservation Standard

All ESI must be preserved in a forensically defensible format that maintains timestamps, authorship, file integrity, and native metadata. Preservation must prevent spoliation through automated deletion, overwrites, or normal business operations.

Why This Matters

Accurate identification determines collection scope and cost. Underestimating custodian count or data sources can lead to inadequate preservation, missed deadlines, and sanctions. A case with 20 custodians across 5 platforms may require coordination with IT, HR, and external vendors to ensure complete preservation.

Collection Reference Metrics

Collection involves extracting ESI from source systems while maintaining integrity and defensibility. These metrics help estimate data volumes and storage requirements.

Email Collection Metrics

Metric	Typical Range	Notes
Emails per GB (without attachments)	75,000 – 100,000	Plain text and simple HTML emails
Emails per GB (with attachments)	10,000 – 30,000	Realistic estimate for typical business email
Attachments per email (average)	1 – 3	Large impact on storage size and processing complexity
Email storage per custodian	5 – 50 GB	Varies by role, tenure, and archiving practices
Legacy PST files per custodian	2 – 10	Historical archives, offline storage, personal backups

File System Collection Metrics

Metric	Typical Range	Notes
Files per GB	3,000 – 20,000	Depends on file types and compression
Network share storage per custodian	10 – 100 GB	Department shares, project folders, personal drives
Cloud account exports (per custodian)	1 – 500+ GB	Email, drive, chat, calendar, and backups
Mobile device storage	32 – 512 GB	Includes deleted data, hidden files, app data
Slack/Teams chat data (per user per year)	500 MB – 5 GB	Messages, files, channels; varies by usage patterns

Collection Standard

Collections must generate cryptographic hash values (MD5, SHA-1, or SHA-256), maintain chain-of-custody logs, preserve custodian records, and create audit trails for every file collected. All collection tools must operate in read-only mode to prevent data alteration.

Real-World Note: Data Variance

Email volumes vary dramatically by role. Executives and sales personnel often have 50–100 GB per mailbox, while operational staff may have 5–10 GB. Cloud platform usage has increased average data volumes by 2–3x over the past five years. Always collect test samples before estimating full-case volumes.

Why This Matters

Collection metrics drive storage planning, processing timelines, and cost estimates. A case with 10 custodians at 30 GB each yields 300 GB of raw data—which may expand to 600–900 GB after processing and deduplication. Underestimating collection volumes can delay case timelines and exceed budgets.

Processing & Normalization Metrics

Processing transforms raw ESI into reviewable, searchable documents while maintaining metadata and applying filters to reduce volume.

Processing Efficiency Metrics

Metric	Typical Range	Notes
Deduplication reduction	10% – 70%	Depends on custodian overlap and document types
Near-duplicate reduction	5% – 30%	Email threading and similar files with minor variations
Data expansion (archive unpacking)	1.2× – 3×	ZIP, PST, RAR, and compressed archives inflate during extraction
System files filtered	15% – 40%	Operating system files, temp files, executables
Processing speed	50 – 500 GB/day	Based on infrastructure, file complexity, and quality checks

Text Extraction & OCR Metrics

Document Type	OCR Success Rate	Notes
Modern scanned documents (clean, B&W)	95% – 99%	Standard office documents, good scan quality
Color documents with graphics	90% – 95%	Charts, images, mixed content
Legacy or degraded documents	85% – 92%	Fading, skew, poor contrast
Faxes and photocopies	75% – 85%	Noise, distortion, multiple-generation copies
Handwritten documents	Low / Unreliable	Typically requires manual review or specialized tools

Processing Standard

All files must be decompressed, deduplicated, text-extracted, and normalized into searchable, review-ready formats. Processing logs must document every transformation, extraction failure, and quality check. Hash values must be preserved throughout processing to ensure file integrity.

Real-World Note: Processing Variability

Processing timelines vary significantly based on file mix. A dataset with many PST archives and ZIP files will take 2–3x longer than native Office documents. OCR quality directly impacts review efficiency—poor OCR often requires manual review or reprocessing, adding weeks to case timelines.

Why This Matters

Processing efficiency determines review costs and timelines. A 500 GB collection that deduplicates to 200 GB (60% reduction) saves substantial review time and cost. However, data expansion from archives can offset these gains. Understanding processing metrics helps set realistic deadlines and budgets.

Processing Output & Volume

Metric	Typical Range	Notes
Documents per GB (after processing)	20,000 – 100,000	Varies by file type mix and metadata
Text extraction size increase	+5% – 15%	Extracted text stored separately from native files
Processing error rate	0.1% – 2%	Corrupted files, password-protected documents, unsupported formats
Metadata completeness	85% – 98%	Depends on source systems and collection methods

Review & Analysis Benchmarks

Review involves human assessment of processed documents for relevance, privilege, and responsiveness. These benchmarks help estimate review timelines and resource requirements.

Review Volume Metrics

Metric	Typical Range	Notes
Documents per GB (reviewable)	20,000 – 100,000	After processing and filtering
Reviewer throughput (simple documents)	1,500 – 2,000 docs/day	Email, short memos, simple correspondence
Reviewer throughput (complex documents)	500 – 1,000 docs/day	Contracts, technical documents, spreadsheets
Quality control (QC) sampling rate	5% – 10%	Documents reviewed by senior reviewers for consistency

Review Yield Metrics

Category	Typical Rate	Notes
Relevance rate	2% – 20%	Documents responsive to discovery requests
Privilege rate	1% – 10%	Attorney-client, work product, confidential communications
Hot documents (key evidence)	0.1% – 2%	Critical documents for case strategy
Not relevant	70% – 95%	Documents not responsive to requests

Review Duration Estimates

Review Size	Typical Duration	Assumptions
Small review (10,000 – 50,000 docs)	2 – 4 weeks	2–3 reviewers, moderate complexity
Medium review (50,000 – 250,000 docs)	6 – 12 weeks	5–10 reviewers, mixed document types
Large review (250,000 – 1M docs)	3 – 6 months	15–30 reviewers, complex case issues
Mega review (1M+ docs)	6 – 18+ months	Large teams, may use AI/TAR for acceleration

Review Standard

All documents must be searchable, filterable, taggable, and auditable. Review platforms must track reviewer decisions, timestamps, and changes. Quality control processes must verify consistency and accuracy. Privilege logs must be maintained for all withheld documents.

Real-World Note: Review Variability

Review speed varies dramatically by document complexity and case familiarity. First-pass reviewers on unfamiliar topics may achieve 500 docs/day, while experienced reviewers on routine matters can exceed 2,000 docs/day. Technology-assisted review (TAR) can reduce review volumes by 40–70% when properly deployed.

Why This Matters

Review represents the largest variable cost in eDiscovery. A 500,000 document review with a 5% relevance rate produces 25,000 responsive documents. Understanding yield rates helps budget accurately and plan production timelines. Low relevance rates may indicate over-collection or broad preservation.

Production & Presentation Metrics

Production involves packaging responsive documents with metadata for delivery to opposing counsel or regulatory agencies. Presentation focuses on trial exhibits and courtroom use.

Production Volume & Format

Metric	Typical Range	Notes
Load file size per 100,000 docs	1 – 10 GB	Includes metadata, extracted text, and control files
Native file production ratio	5% – 30%	Spreadsheets, databases, CAD files, videos preserved in native format
Image file production ratio	70% – 95%	TIFF or PDF images with metadata and text files
Pages per GB (PDF production)	20,000 – 40,000	After flattening, OCR, and Bates stamping
Bates numbering speed	10,000 – 50,000 pages/hour	Depends on image quality and numbering complexity

Trial Presentation Metrics

Item	Typical Range	Notes
Color pages in trial sets	5% – 25%	Photos, charts, highlights, key documents
Trial binders per case	10 – 500+	Depends on exhibits, parties, and trial duration
Exhibits per day of trial	10 – 50	Varies by case complexity and attorney style
Trial exhibit preparation time	2 – 6 weeks	From exhibit list finalization to courtroom delivery

Production Standard

All productions must be reproducible, hash-verifiable, metadata-preserving, and court-compliant. Productions must include load files, metadata fields required by requesting party, and technical specifications documentation. Bates numbering must be consistent, sequential, and properly prefixed/suffixed.

Real-World Note: Production Formats

Production format requirements vary by jurisdiction and agreement. While TIFF with load files was once standard, PDF with metadata is increasingly common for its convenience and reduced file size. Native production of spreadsheets with embedded formulas is now routine. Always confirm format specifications before processing begins.

Why This Matters

Production metrics determine delivery timelines and storage requirements. A 100,000 page production in TIFF format might be 50 GB, while the same production in PDF could be 25 GB. Load file preparation and QC typically add 2–5 days to production timelines. Trial exhibit preparation should begin 4–6 weeks before trial to allow for revisions and contingencies.

Production File Formats

Format	Typical Use Case	Considerations
Single-Page TIFF + Load File	Traditional litigation production	Large file sizes, universal compatibility, established standard
Multi-Page Searchable PDF	Modern productions, smaller files	Reduced storage, easier handling, requires PDF reader
Native Files with Metadata	Spreadsheets, databases, CAD, video	Preserves functionality, requires specialized software
ESI Protocol Specified Format	As agreed by parties	Follow court orders and agreements precisely

Chain-of-Custody Standards

At every stage of the EDRM lifecycle, defensibility requires comprehensive documentation:

Every file is logged: Complete inventory with hash values, source locations, and custodian attribution
Every handoff is recorded: Transfer documentation between collection, processing, review, and production teams
Every transformation is documented: Processing logs showing deduplication, filtering, OCR, and format conversions
Every production is reproducible: Complete audit trail allowing recreation of production at any future date

This ensures defensibility in the event of a court challenge, sanctions motion, or production dispute. Proper chain-of-custody documentation protects against spoliation allegations and demonstrates good-faith compliance with discovery obligations.

Glossary

Common terms and definitions used in eDiscovery and digital evidence management:

ESI

Electronically Stored Information – Any data created, stored, or transmitted in electronic form

Custodian

A person or system that controls data subject to preservation or collection

Deduplication

Removal of identical files or documents based on hash value comparison

Hash Value

Cryptographic fingerprint (MD5, SHA-1, SHA-256) uniquely identifying a file

OCR

Optical Character Recognition – Technology converting images to searchable text

Native File

Original file format (Excel, Word, MSG, etc.) preserving functionality and metadata

Load File

Structured data file (DAT, OPT, LFP) used by review platforms to import documents and metadata

Metadata

System information describing a file: author, dates, recipients, file properties, etc.

Bates Number

Sequential identifier stamped on each page for citation and tracking purposes

Chain of Custody

Documentation of evidence handling from collection through production

Privilege Log

List of documents withheld from production due to privilege or work product protection

TAR

Technology-Assisted Review – Machine learning tools to accelerate document review

EDRM Framework Overview

Identification & Preservation Reference Metrics

Collection Reference Metrics

Email Collection Metrics

File System Collection Metrics

Processing & Normalization Metrics

Processing Efficiency Metrics

Text Extraction & OCR Metrics

Processing Output & Volume

Review & Analysis Benchmarks

Review Volume Metrics

Review Yield Metrics

Review Duration Estimates

Production & Presentation Metrics

Production Volume & Format

Trial Presentation Metrics

Production File Formats

Chain-of-Custody Standards

Glossary

Accept Cookies