Challenges of Working with Large Data:

An average of 306.4 millions emails and 23 billion text messages  were sent every day in 2020. That’s about 41 million messages sent every minute. Data is expanding at rapid rate which has a large impact on eDiscovery. Many applications weren’t built with discovery in mind; therefore, collection has become more complicated. Here are some challenges face when dealing with large data sets and some insights on how they can be addressed.

Sensitive PII/SHI: The combination of expanding data volumes, data sources, and increasing regulations regarding the transmission and production of sensitive personally identifiable information (PII) and sensitive health information (SHI) presents unique challenges. Organizations need to quickly respond to Data Subject Access Requests (DSARs). These requests require organization to be able to efficiently locate and identify data sources that contains sensitive information. Once the information is located, redaction is often required. In the past, organizations have relief on solutions based onRegular Expressions (RegEx) to identify this type of content. Unfortunately, those solutions provide limited accuracy. Improvements in AI and big data analytics offer new ways to identify sensitive content, not only at the source, but also later during the discovery process.

Proprietary Information: With the ever expanding ways people communicate, proprietary information is transmitted in more ways. Proprietary formulas, client contacts, customer lists, and other items considered trade secrets need to be closely safeguarded. These types of information should be treated just like sensitive PII/SHI. Many of the same techniques that are used to protect PII/SHI are also used to protect proprietary information and trade secrets.

Privilege: Every discovery effort begins with identifying information relevant to the matter. The second step is ensuring there isn’t any privileged information inadvertently produced. The rise of predictive analytics has increased the efficiency of the identification of privileged information. Previously, the identification of this information was mostly centered on search terms and manual review. TAR solutions have helped improve how a document is evaluated.

Risk: Every eDiscovery effort utilizes risk-mitigation strategies in some way. There is increasing emphasis on comprehensive records management, data loss prevention, and threat management strategies. Organizations are challenged with identifying this content throughout the discovery process. They also must understand where the content resides at the source and ensure they have the appropriate means to identify, collect, and secure it. Advances inAI and big data analytics allows more comprehensive discovery programs leverage the identification of these data types.

Even though there are continually new challenges of working with large data sets, these can more easily be addressed AI, analytics, data reuse, and more are constantly being used and improved.


For more Tidbits & Thoughts, please click here.