Digital Information Glossary

There are numerous terms and concepts related to eDiscovery and the management of electronically stored data that industry professionals must know and understand. Below is a sampling of some of the most common terms used in the eDiscovery world.


Bayesian Search: An advanced search, developed by an 18th century mathematician and clergyman, that utilizes statistical probability rules to compute the likelihood that a document is relevant to a query

 Clustering: Unsupervised machine learning in which thematically similar files are grouped together based on the text of the individual files

 Container File: A compressed file containing multiple files; used to minimize the size of the original files for storage and/or transporting

 Early Data Assessment: It’s the process of separating possibly relevant electronically stored information from non-relevant electronically stored information using both computer techniques, such as date filtering or advanced analytics, and human-assisted logical determinations at the beginning of a case

 Ephemeral Data: Data that exists for a very brief, temporary period and is transitory in nature, such as data stored in random access memory (RAM)

 ISO 27001: An ISO standard that formally specifies an Information Security Management System (ISMS), a suite of activities concerning the management of information security risks. The ISMS is an overarching management framework through which an organization identifies, analyzes, and addresses its information risks.

 Stop Words: Common words (e.g., all, the, of, but, not) that are purposefully excluded from a search index when it is created in order to make the index more efficient

 Elusion: The percentage of documents of a search’s null set that were missed by the search, usually determined with review of a random sample of the null set;

 Supervised Learning: Use of machine learning to analyze data, using training examples that have been coded by humans

 Hash Coding: A mathematical algorithm that calculates a unique value for a given set of data, similar to a digital fingerprint, representing the binary content of the data to assist in subsequently ensuring that data has not been modified.


For a more comprehensive glossary, The Sedona Conference Glossary, eDiscovery & Digital Information Management, Fifth Edition can be found here.

For more Tidbits & Thoughts, please click here.