Posts Tagged ‘electronic discovery’

Multimodal Search for Predictive Coding Training Documents and the Folly of Random Search – Part Two

Multimodal Search for Predictive Coding Training Documents and the Folly of Random Search – Part Two

Cormack and Grossman set up an ingenious experiment to test the effectiveness of three machine learning protocols. It is ingenious for several reasons, not the least of which is that they created what they call an “evaluation toolkit” to perform the experiment. They have even made this same toolkit, this same software, freely available for use by any other qualified researchers. They invite other scientists to run the experiment for themselves. They invite open testing of their experiment. They invite vendors to do so too, but so far there have been no takers.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

Random vs Active Selection of Training Examples in eDiscovery

Random vs Active Selection of Training Examples in eDiscovery

I want to talk about an issue that is attracting attention at the moment: how to select documents for training a predictive coding system. The catalyst for this current interest is “Evaluation of Machine Learning Protocols for Technology Assisted Review in Electronic Discovery”, recently presented at SIGIR by Gord Cormack and Maura Grossman.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

The Text Streetlight, Oversampling, and Information Governance

The Text Streetlight, Oversampling, and Information Governance

The Grossman-Cormack article, ” Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery ,” has kicked off some useful discussions. Here are our comments on two blog posts about the article, one by Ralph Losey, the other by John Tredennick and Mark Noel.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

Using Only Random Selection to Find Predictive Coding Training Documents Is Easy, But Foolish

Using Only Random Selection to Find Predictive Coding Training Documents Is Easy, But Foolish

The easiest way to train documents for predictive coding is simply to use random samples. It may be easy, but, as far as I am concerned, it is also defies common sense.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

Drunks, DNA and Data Transfer Risk in eDiscovery

Drunks, DNA and Data Transfer Risk in eDiscovery

Data transfer risk may be minimized by automation and standards or increased by the requirement of human intervention. As automation and standards are still slowly maturing in the realm of electronic discovery technology, it seems important that legal professionals understand and properly consider the impact of potential data transfer risk as they plan, source, and conduct their electronic discovery activities.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

A Problem with Text-based Technology-Assisted Review

A Problem with Text-based Technology-Assisted Review

Since the last century, text analysis has been the primary tool used to classify documents, but its durability as the tool of choice doesn’t mean that it remains the best choice.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

An eDiscovery Market Size Mashup: 2013-2018 Worldwide Software and Services Overview

An eDiscovery Market Size Mashup:  2013-2018 Worldwide Software and Services Overview

Taken from a combination of public market sizing estimations as shared in leading electronic discovery reports, publications and posts over time, the following eDiscovery Market Size Mashup shares general worldwide market sizing considerations for both the software and service areas of the electronic discovery market for the years between 2013 and 2018.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

The Legal Technology Future Horizons Report

The Legal Technology Future Horizons Report

International Legal Technology Association: “Released in May 2014, Legal Technology Future Horizons (LTFH) is a report that provides insights and practical ideas to inform the development of future business and IT strategies for law firms, law departments and legal technology vendors. The research, analysis and interpretation of the findings were undertaken by Fast Future Research and led by Rohit Talwar.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

Four Years of Magic: The Gartner Magic Quadrant for E-Discovery Software

Four Years of Magic: The Gartner Magic Quadrant for E-Discovery Software

Published annually by Gartner, the Magic Quadrant for E-Discovery Software is a concise research report that highlights key market developments and dynamics in the field of eDiscovery and provides an comparative evaluation of leading eDiscovery software vendors. An aggregation from public domain sources of eDiscovery vendors who have been selected and evaluated as part of the annual Magic Quadrant for E-Discovery Software since 2011 is provided for your review and consideration.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

The Operational Center of Gravity of eDiscovery Providers: A Look at 130+ Vendors

The Operational Center of Gravity of eDiscovery Providers: A Look at 130+ Vendors

With a desire to increase the understanding of those considering engaging with eDiscovery vendors, provided is a simple subjective analysis of the operational center of gravity for approximately 130 providers.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

New Study Mired in the TAR Pit?

New Study Mired in the TAR Pit?

The hype cycle around Predictive Coding/Technology Assisted Review (PC/TAR) has focused around court acceptance and actual review cost savings. The last couple weeks have seen a bit of blogging kerfuffle over the conclusions, methods and implications of the new study by Gordon Cormack and Maura Grossman, “Evaluation of Machine-Learning Protocols for Technology-Assisted-Review in Electronic Discovery”. Pioneering analytics guru Herbert L. Roitblat of OrcaTec has published two blogs (first and second links) critical of the study and its conclusions. As much as I love a spirited debate and have my own history of ‘speaking truth’ in the public forum, I can’t help wondering if this tussle over Continuous Active Learning (CAL) vs. Simple Active Learning (SAL) has lost view of the forest while looking for the tallest tree in it.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

Considering Proportionality: Electronic Discovery Best Practices

Considering Proportionality: Electronic Discovery Best Practices

Proportionality. Parties are expected to use reasonable, good faith and proportional efforts to preserve, identify and produce relevant information. This includes identifying appropriate limits to discovery, including limits on custodians, identification of relevant subject matter, time periods for discovery and other parameters to limit and guide preservation and discovery issues.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

Electronic Discovery: Are We Competent?

Electronic Discovery: Are We Competent?

The State Bar of California has issued perhaps the country’s most straightforward and candid directive to litigators to learn the ins and outs of electronic discovery (e-discovery). In a proposed formal opinion, it states, “Not every litigated case ultimately involves e-discovery; however, in today’s technological world, almost every litigation matter potentially does.”

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

Measuring Text Bias/Tunnel Vision in Content Search and ECM Systems

Measuring Text Bias/Tunnel Vision in Content Search and ECM Systems

Calculating MTV Ratio and True Recall Many tools designed to search or classify documents as part of the enterprise content management and electronic discovery functions in organizations depend on having accurate textual representations of the documents being analyzed or indexed. They have text-tunnel vision – they cannot “see” non-textual objects. If the only documents of interest were text-based, that could be an excusable shortcoming, depending on what tasks were being performed. However, there are some collections where as many as one-half of the documents of interest contain no textual representation.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

The Science of Comparing Learning Protocols – Thoughts on the Cormack & Grossman Article

The Science of Comparing Learning Protocols – Thoughts on the Cormack & Grossman Article

In this post, I want to focus more on the science in the Cormack and Grossman article. It seems that several flaws in their methodology render their conclusions not just externally invalid — they don’t apply to systems that they did not study, but internally invalid as well — they don’t apply to the systems they did study.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

What’s In A Name? Information Governance Finds A Home On The EDRM

What’s In A Name? Information Governance Finds A Home On The EDRM

When EDRM , the organization that created the Electronic Discovery Reference Model , launched its Information Governance Reference Model (IGRM) I wondered how long it would take for this day to come. The wait is over. Information governance (IG) has taken its place on the EDRM. In this post I will take a look at the changes and consider whether they have gone too far, have got it just right, or maybe have a little room for more tweaks.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

A Short Comment on the Technology-Assisted Review Problem

A Short Comment on the Technology-Assisted Review Problem

Provided for your review is a short but important comment from a recent LinkedIn Group discussion on the evaluation of machine learning protocols for technology-assisted review (TAR).  The comment introduced a problems with TAR in the fact that it is limited text.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

Roitblat Challenges Competitor Claims On Random Sampling Effectiveness in Predictive Coding

Roitblat Challenges Competitor Claims On Random Sampling Effectiveness in Predictive Coding

I don’t usually comment on competitors’ claims, but I thought that I needed to address some potentially serious misunderstandings that could come out of Cormack and Grossman’s latest article, “ Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery .” Although they find in this paper that an active learning process is superior to random sampling, it would be a mistake to think their conclusions would apply to all random sampling predictive coding regimens.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

Got Information Governance? A Short List of Enablers

Got Information Governance? A Short List of Enablers

Based on a compilation of research from analyst firms and industry expert reports in the information governance arena, the following short list of enablers highlights companies and firms that may be useful in the consideration of information governance products and services.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

Pioneering Cormack/Grossman Study Validates Continuous Learning, Judgmental Seeds and Review Team Training for Technology Assisted Review

Pioneering Cormack/Grossman Study Validates Continuous Learning, Judgmental Seeds and Review Team Training for Technology Assisted Review

The results show that entirely non-random training methods, in which the initial training documents are selected using a simple keyword search, and subsequent training documents are selected by active learning, require substantially and significantly less human review effort (P < 0.01) to achieve any given level of recall, than passive learning, in which the machine-learning algorithm plays no role in the selection of training documents.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

2014 eDiscovery Mergers, Acquisitions and Investments

2014 eDiscovery Mergers, Acquisitions and Investments

Provided as a non-comprehensive overview of key and publicly announced eDiscovery related mergers, acquisitions and investments to date in 2014, the following listing highlights key industry activities through the lens of announcement date, acquired company, acquiring or investing company and acquisition amount (if known).

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery

Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery

Using a novel evaluation toolkit that simulates a human reviewer in the loop, we compare the effectiveness of three machine-learning protocols for technology-assisted review as used in document review for discovery in legal proceedings.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

Information Governance and eDiscovery: How Does Your Vendor Deal With Non-Textual Files?

Information Governance and eDiscovery: How Does Your Vendor Deal With Non-Textual Files?

Good vendors share what they know they see. Great vendors share what they may not see so you can make informed decisions as to risk and exposure.

LinkedInTwitterGoogle+FacebookTumblrBufferEmail

Is Your Data An Asset Or A Liability?

Is Your Data An Asset Or A Liability?

The proliferation of data and how it is being managed — or in most cases mismanaged — is causing more organizations to question whether they have information assets or liabilities. Two of the major drivers pushing organizations to finally get their data under control are costs and risks.“People are starting to get interested in reducing their overall data in many cases for regulatory issues,” said Dera Nevin, managing director and an electronic discovery lawyer at re:Discovery Law PC.Nevin, who was speaking to the International Legal Technology Association last week at an event hosted by Norton Rose Fulbright Canada LLP, [...]

LinkedInTwitterGoogle+FacebookTumblrBufferEmail