This page provides practical reference metrics aligned to the Electronic Discovery Reference Model (EDRM). It is designed for attorneys, paralegals, and litigation support teams who need real-world planning data for digital evidence, email collections, and eDiscovery production.
These benchmarks help answer:
Important: These figures represent industry-standard planning estimates used in litigation support and eDiscovery operations. Actual volumes and timelines vary based on case scope, data types, custodians, and technical environments. This page is provided for operational reference only and does not constitute legal advice.
The Electronic Discovery Reference Model defines the lifecycle of digital evidence in litigation. This page provides reference metrics and operational standards for the stages most relevant to volume, cost, defensibility, and trial production.
Identification and preservation establish the scope of data subject to legal hold. These metrics help estimate the complexity and breadth of ESI collection efforts.
| Metric | Typical Range | Notes |
|---|---|---|
| Custodian count per case | 1 – 50+ | Employees, executives, vendors, and third parties |
| Email accounts per custodian | 1 – 4 | Includes work, personal, and legacy accounts |
| Cloud platforms per case | 2 – 10 | Microsoft 365, Google Workspace, iCloud, Slack, Dropbox, etc. |
| Preservation window (time span) | 3 – 10+ years | Litigation holds often span multiple matters and employment changes |
| Metadata fields preserved | 100 – 300+ | Sender, recipient, timestamps, IPs, file hashes, system properties |
| Mobile devices per custodian | 1 – 3 | Work phone, personal phone, tablets |
All ESI must be preserved in a forensically defensible format that maintains timestamps, authorship, file integrity, and native metadata. Preservation must prevent spoliation through automated deletion, overwrites, or normal business operations.
Accurate identification determines collection scope and cost. Underestimating custodian count or data sources can lead to inadequate preservation, missed deadlines, and sanctions. A case with 20 custodians across 5 platforms may require coordination with IT, HR, and external vendors to ensure complete preservation.
Collection involves extracting ESI from source systems while maintaining integrity and defensibility. These metrics help estimate data volumes and storage requirements.
| Metric | Typical Range | Notes |
|---|---|---|
| Emails per GB (without attachments) | 75,000 – 100,000 | Plain text and simple HTML emails |
| Emails per GB (with attachments) | 10,000 – 30,000 | Realistic estimate for typical business email |
| Attachments per email (average) | 1 – 3 | Large impact on storage size and processing complexity |
| Email storage per custodian | 5 – 50 GB | Varies by role, tenure, and archiving practices |
| Legacy PST files per custodian | 2 – 10 | Historical archives, offline storage, personal backups |
| Metric | Typical Range | Notes |
|---|---|---|
| Files per GB | 3,000 – 20,000 | Depends on file types and compression |
| Network share storage per custodian | 10 – 100 GB | Department shares, project folders, personal drives |
| Cloud account exports (per custodian) | 1 – 500+ GB | Email, drive, chat, calendar, and backups |
| Mobile device storage | 32 – 512 GB | Includes deleted data, hidden files, app data |
| Slack/Teams chat data (per user per year) | 500 MB – 5 GB | Messages, files, channels; varies by usage patterns |
Collections must generate cryptographic hash values (MD5, SHA-1, or SHA-256), maintain chain-of-custody logs, preserve custodian records, and create audit trails for every file collected. All collection tools must operate in read-only mode to prevent data alteration.
Email volumes vary dramatically by role. Executives and sales personnel often have 50–100 GB per mailbox, while operational staff may have 5–10 GB. Cloud platform usage has increased average data volumes by 2–3x over the past five years. Always collect test samples before estimating full-case volumes.
Collection metrics drive storage planning, processing timelines, and cost estimates. A case with 10 custodians at 30 GB each yields 300 GB of raw data—which may expand to 600–900 GB after processing and deduplication. Underestimating collection volumes can delay case timelines and exceed budgets.
Processing transforms raw ESI into reviewable, searchable documents while maintaining metadata and applying filters to reduce volume.
| Metric | Typical Range | Notes |
|---|---|---|
| Deduplication reduction | 10% – 70% | Depends on custodian overlap and document types |
| Near-duplicate reduction | 5% – 30% | Email threading and similar files with minor variations |
| Data expansion (archive unpacking) | 1.2× – 3× | ZIP, PST, RAR, and compressed archives inflate during extraction |
| System files filtered | 15% – 40% | Operating system files, temp files, executables |
| Processing speed | 50 – 500 GB/day | Based on infrastructure, file complexity, and quality checks |
| Document Type | OCR Success Rate | Notes |
|---|---|---|
| Modern scanned documents (clean, B&W) | 95% – 99% | Standard office documents, good scan quality |
| Color documents with graphics | 90% – 95% | Charts, images, mixed content |
| Legacy or degraded documents | 85% – 92% | Fading, skew, poor contrast |
| Faxes and photocopies | 75% – 85% | Noise, distortion, multiple-generation copies |
| Handwritten documents | Low / Unreliable | Typically requires manual review or specialized tools |
All files must be decompressed, deduplicated, text-extracted, and normalized into searchable, review-ready formats. Processing logs must document every transformation, extraction failure, and quality check. Hash values must be preserved throughout processing to ensure file integrity.
Processing timelines vary significantly based on file mix. A dataset with many PST archives and ZIP files will take 2–3x longer than native Office documents. OCR quality directly impacts review efficiency—poor OCR often requires manual review or reprocessing, adding weeks to case timelines.
Processing efficiency determines review costs and timelines. A 500 GB collection that deduplicates to 200 GB (60% reduction) saves substantial review time and cost. However, data expansion from archives can offset these gains. Understanding processing metrics helps set realistic deadlines and budgets.
| Metric | Typical Range | Notes |
|---|---|---|
| Documents per GB (after processing) | 20,000 – 100,000 | Varies by file type mix and metadata |
| Text extraction size increase | +5% – 15% | Extracted text stored separately from native files |
| Processing error rate | 0.1% – 2% | Corrupted files, password-protected documents, unsupported formats |
| Metadata completeness | 85% – 98% | Depends on source systems and collection methods |
Review involves human assessment of processed documents for relevance, privilege, and responsiveness. These benchmarks help estimate review timelines and resource requirements.
| Metric | Typical Range | Notes |
|---|---|---|
| Documents per GB (reviewable) | 20,000 – 100,000 | After processing and filtering |
| Reviewer throughput (simple documents) | 1,500 – 2,000 docs/day | Email, short memos, simple correspondence |
| Reviewer throughput (complex documents) | 500 – 1,000 docs/day | Contracts, technical documents, spreadsheets |
| Quality control (QC) sampling rate | 5% – 10% | Documents reviewed by senior reviewers for consistency |
| Category | Typical Rate | Notes |
|---|---|---|
| Relevance rate | 2% – 20% | Documents responsive to discovery requests |
| Privilege rate | 1% – 10% | Attorney-client, work product, confidential communications |
| Hot documents (key evidence) | 0.1% – 2% | Critical documents for case strategy |
| Not relevant | 70% – 95% | Documents not responsive to requests |
| Review Size | Typical Duration | Assumptions |
|---|---|---|
| Small review (10,000 – 50,000 docs) | 2 – 4 weeks | 2–3 reviewers, moderate complexity |
| Medium review (50,000 – 250,000 docs) | 6 – 12 weeks | 5–10 reviewers, mixed document types |
| Large review (250,000 – 1M docs) | 3 – 6 months | 15–30 reviewers, complex case issues |
| Mega review (1M+ docs) | 6 – 18+ months | Large teams, may use AI/TAR for acceleration |
All documents must be searchable, filterable, taggable, and auditable. Review platforms must track reviewer decisions, timestamps, and changes. Quality control processes must verify consistency and accuracy. Privilege logs must be maintained for all withheld documents.
Review speed varies dramatically by document complexity and case familiarity. First-pass reviewers on unfamiliar topics may achieve 500 docs/day, while experienced reviewers on routine matters can exceed 2,000 docs/day. Technology-assisted review (TAR) can reduce review volumes by 40–70% when properly deployed.
Review represents the largest variable cost in eDiscovery. A 500,000 document review with a 5% relevance rate produces 25,000 responsive documents. Understanding yield rates helps budget accurately and plan production timelines. Low relevance rates may indicate over-collection or broad preservation.
Production involves packaging responsive documents with metadata for delivery to opposing counsel or regulatory agencies. Presentation focuses on trial exhibits and courtroom use.
| Metric | Typical Range | Notes |
|---|---|---|
| Load file size per 100,000 docs | 1 – 10 GB | Includes metadata, extracted text, and control files |
| Native file production ratio | 5% – 30% | Spreadsheets, databases, CAD files, videos preserved in native format |
| Image file production ratio | 70% – 95% | TIFF or PDF images with metadata and text files |
| Pages per GB (PDF production) | 20,000 – 40,000 | After flattening, OCR, and Bates stamping |
| Bates numbering speed | 10,000 – 50,000 pages/hour | Depends on image quality and numbering complexity |
| Item | Typical Range | Notes |
|---|---|---|
| Color pages in trial sets | 5% – 25% | Photos, charts, highlights, key documents |
| Trial binders per case | 10 – 500+ | Depends on exhibits, parties, and trial duration |
| Exhibits per day of trial | 10 – 50 | Varies by case complexity and attorney style |
| Trial exhibit preparation time | 2 – 6 weeks | From exhibit list finalization to courtroom delivery |
All productions must be reproducible, hash-verifiable, metadata-preserving, and court-compliant. Productions must include load files, metadata fields required by requesting party, and technical specifications documentation. Bates numbering must be consistent, sequential, and properly prefixed/suffixed.
Production format requirements vary by jurisdiction and agreement. While TIFF with load files was once standard, PDF with metadata is increasingly common for its convenience and reduced file size. Native production of spreadsheets with embedded formulas is now routine. Always confirm format specifications before processing begins.
Production metrics determine delivery timelines and storage requirements. A 100,000 page production in TIFF format might be 50 GB, while the same production in PDF could be 25 GB. Load file preparation and QC typically add 2–5 days to production timelines. Trial exhibit preparation should begin 4–6 weeks before trial to allow for revisions and contingencies.
| Format | Typical Use Case | Considerations |
|---|---|---|
| Single-Page TIFF + Load File | Traditional litigation production | Large file sizes, universal compatibility, established standard |
| Multi-Page Searchable PDF | Modern productions, smaller files | Reduced storage, easier handling, requires PDF reader |
| Native Files with Metadata | Spreadsheets, databases, CAD, video | Preserves functionality, requires specialized software |
| ESI Protocol Specified Format | As agreed by parties | Follow court orders and agreements precisely |
At every stage of the EDRM lifecycle, defensibility requires comprehensive documentation:
This ensures defensibility in the event of a court challenge, sanctions motion, or production dispute. Proper chain-of-custody documentation protects against spoliation allegations and demonstrates good-faith compliance with discovery obligations.
Common terms and definitions used in eDiscovery and digital evidence management: