A Little to the Left Please: Data Discovery Enablement

Data Discovery is the exploration of patterns and trends within unstructured data with the objective of uncovering insight and driving action.*

Based on an informal review of research from technology providers, industry analyst firms, and industry expert reports in the data discovery arena, the following short list of enablers highlights companies and technologies that may be useful to technology providers as legal discovery professionals seek to move “to the left of the EDRM” and closer to the point of data creation in their data discovery efforts.

These companies and technologies may be beneficial in helping users detect, identify, explore, and report on unstructured data. They may also be helpful for data reconnaissance and data surveillance efforts as users see to discover data they do know about as well as discover data they do not know about.

Early Discovery Steps

  • Detect: To discover the presence or existence of unstructured data.
  • Identify: To recognize or indicate the characteristics or classifications of unstructured data.
  • Explore: To examine or investigate unstructured data.
  • Report: To document and render a formal account of what has been learned about unstructured data.

Early Discovery Techniques

  • Data Reconnaissance: Activities undertaken to obtain by various detection methods information about the existence, characteristics, and relationships of unstructured data.
  • Data Surveillance: The process of systematically observing unstructured data to gather information on actions or inaction related to the observed data.

Additionally, eDiscovery providers seeking to extend their core offerings beyond traditional legal discovery tasks and to provide customers with tools that allow them to begin to discover data from the point of its creation may find these companies and technologies worthy of consideration for process and task integration partnerships.

Data Discovery Enablers: Companies and Technologies

The following listing is not inclusive of all capable companies and technologies, but it does provide a solid overview of notable companies and technologies that appear to be instrumental in enabling data discovery.

AccessData: AD Enterprise provides visibility into all activity on endpoints, network shares and peripheral devices. Website: http://www.accessdata.com

Adlib: Adlib Elevate enables digital preparation of content for improved migration, compliance, privacy and security, digital transformation, and classification. Website: http://www.adlibsoftware.com

DataGravity: The DataGravity Discovery Series is a data-aware storage platform that allows IT professionals and line-of-business users to store, protect, search and govern their data. It analyzes data as it is ingested without impacting performance, so administrators and users can quickly and easily explore and use data more effectively to derive insights that increase productivity, efficiency, and organizational success. Website: http://www.datagravity.com

Druid: Druid is an open-source analytics data store designed for business intelligence (OLAP) queries on event data. Druid provides low latency (real-time) data ingestion, flexible data exploration, and fast data aggregation. Website: http://druid.io/

Elastic: Elastic, formerly ElasticSearch, is a real-time big data search and analytics company that develops and supports Elastic, a growing open source solution. Used by enterprises in virtually every vertical market, Elastic changes big data search and analytics by empowering anyone to turn it into valuable information to improve business results. Website: http://www.elastic.com

Exterro/File Analysis: Exterro File Analysis provides critical insight and orchestrates data consolidation, legacy data migration, sensitive content identification, defensible deletion and other information management projects and processes. Website: http://www.exterro.com

Guidance Software/EnCase Endpoint Investigator: Identifying the relevance of potential evidence, prioritizing it, and determining whether further processing is needed are key aspects of the triage phase of an investigation. EnCase Endpoint Investigator can quickly review information stored on computers across the network in real-time – without altering or damaging information. Website: http://www.guidancesoftware.com/

Heureka Software: Formerly VeDiscovery, Heureka provides a unified framework that enables organizations to identify information and gather intelligence from their digital endpoints on a global scale, in real time. The Heureka Intelligence Platform allows organizations to discover and analyze unstructured data in-place for strategic, surgical incident and event response. Website: http://www.heurekasoftware.com

Haystac: Indāgō from Haystac searches, crawls, profiles, analyzes, and classifies unstructured data of more than 600 file types from virtually any unstructured data repository in support of multiple industry use cases. Website: http://www.haystac.com

IBM/StoredIQ: The IBM/StoredIQ Platform provides scalable analysis and governance of unstructured data in-place across disparate and distributed email, file shares, desktops, and collaboration sites. Its products enable companies to discover, analyze, and act on data for eDiscovery; records retention and disposition; compliance; and storage optimization initiatives. Website: http://bit.ly/StoredIQOverview

Lucene: Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Apache Lucene is an open source project. Website: http://lucene.apache.org/

Spirion: The Spirion Data Platform helps organizations avoid costly data breaches by discovering, classifying, monitoring and protecting personal information, medical records, credit card numbers, and intellectual property stored across the enterprise, within e-mail, and in the cloud. Spirion specializes in the high-precision search and automated classification of unstructured data using its AnyFind engine’s unparalleled accuracy when analyzing human-generated text and images. Website: http://www.spirion.com

Splunk: Splunk provides a software platform that enables organizations to gain real-time operational intelligence by harnessing the value of data. The company’s software collects and indexes data at massive scale, regardless of format or source, and enables users to quickly and easily search, correlate, analyze, monitor and report on this data, all in real-time. Website: http://www.splunk.com

Tanium: Tanium is a systems management solution that provides instant visibility and allows enterprises to collect data and update machines across networks. Website: http://www.tanium.com

TITUS: TITUS solutions enable organizations to discover, classify, protect and confidently share information, and meet regulatory compliance requirements by identifying and securing unstructured data. TITUS products enhance data loss prevention by classifying and protecting sensitive information in emails, documents and other file types – on the desktop, on mobile devices, and in the Cloud. Website: http://www.titus.com

Varonis Systems: Varonis is a provider of unstructured and semi-structured data governance for file systems, SharePoint and NAS devices, and Exchange servers. Based on an accurate analytics engine, Varonis’ solutions give organizations visibility and control over data, ensuring that only the right users have access to the right data at all times. Website: http://www.varonis.com

X1 Discovery: X1’s unique, patented technology solves the fundamental problem caused by the historic inability of organizations and individuals to access essential information by empowering them to find that information across desktops, network data, and social networks, all within a unified view and at the fastest speeds in the industry. Website: http://www.x1.com

