Posts Tagged ‘electronic discovery’

Looking Back on Zubulake, 10 Years Later

Looking Back on Zubulake, 10 Years Later

Features Portrait of Laura Zubulake by Anita Kunz When Laura Zubulake first brought her employment discrimination lawsuit to attorney James Batson in 2001, neither of them thought the case would make history. Neither did U.S. District Judge Shira Scheindlin, who presided over the case in the Southern District of New York. In fact, Scheindlin has mentioned many times that Zubulake’s lawsuit seemed like a "garden-variety employment discrimination case." Zubulake didn’t get a promotion she thought she had earned at the global financial services firm UBS Warburg, filed a complaint with human resources and suddenly found herself at odds with [...]


Discovering the @Inc5000: A Look at 16 eDiscovery Enablers on the 2014 List

Discovering the @Inc5000: A Look at 16 eDiscovery Enablers on the 2014 List

Based on a website review of this year’s Inc. 5000, the following list provides a quick, non-all inclusive reference of some of the eDiscovery enablers that have been included in the 2014 list. The sortable list includes the provider’s name, 2014 Inc. 5000 ranking (#), three year revenue growth (%), 2013 revenue ($) and industry categorization.


Continuous Active Learning for Technology Assisted Review (How it Works and Why it Matters for E-Discovery)

Continuous Active Learning for Technology Assisted Review (How it Works and Why it Matters for E-Discovery)

Grossman and Cormack concluded that CAL demonstrated superior performance over SPL and SAL, while avoiding certain other problems associated with these traditional TAR 1.0 protocols. Specifically, in each of the eight case studies, CAL reached higher levels of recall (finding relevant documents) more quickly and with less effort that the TAR 1.0 protocols.


Got Relativity? A Quick Reference to 100+ Hosting and Consulting Partners

Got Relativity? A Quick Reference to 100+ Hosting and Consulting Partners

Since its 2007 introduction, kCura’s Relativity product has become one of the world’s leading attorney review platforms. One of the elements of Relativity’s strong growth and marketplace acceptance has been kCura’s focus on and support of partnerships. Provided as a by-product of review platform research and presented in the form of a simple and sortable table is an aggregation of kCura Premium Hosting Partners and Consulting Partners.


Proving the Folly of Using Random Search For Machine Training – Part Four

Proving the Folly of Using Random Search For Machine Training – Part Four

A random only search method for predictive coding training documents is ineffective. The same applies to any other training method if it is applied to the exclusion of all others. Any experienced searcher knows this.


The Folly of Using Random Search For Machine Training – Part Three

The Folly of Using Random Search For Machine Training – Part Three

The results presented here do not support the commonly advanced position that seed sets, or entire training sets, must be randomly selected [19, 28] [contra 11]. Our primary implementation of SPL, in which all training documents were randomly selected, yielded dramatically inferior results to our primary implementations of CAL and SAL, in which none of the training documents were randomly selected.


Multimodal Search for Predictive Coding Training Documents and the Folly of Random Search – Part Two

Multimodal Search for Predictive Coding Training Documents and the Folly of Random Search – Part Two

Cormack and Grossman set up an ingenious experiment to test the effectiveness of three machine learning protocols. It is ingenious for several reasons, not the least of which is that they created what they call an “evaluation toolkit” to perform the experiment. They have even made this same toolkit, this same software, freely available for use by any other qualified researchers. They invite other scientists to run the experiment for themselves. They invite open testing of their experiment. They invite vendors to do so too, but so far there have been no takers.


Random vs Active Selection of Training Examples in eDiscovery

Random vs Active Selection of Training Examples in eDiscovery

I want to talk about an issue that is attracting attention at the moment: how to select documents for training a predictive coding system. The catalyst for this current interest is “Evaluation of Machine Learning Protocols for Technology Assisted Review in Electronic Discovery”, recently presented at SIGIR by Gord Cormack and Maura Grossman.


The Text Streetlight, Oversampling, and Information Governance

The Text Streetlight, Oversampling, and Information Governance

The Grossman-Cormack article, ” Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery ,” has kicked off some useful discussions. Here are our comments on two blog posts about the article, one by Ralph Losey, the other by John Tredennick and Mark Noel.


Using Only Random Selection to Find Predictive Coding Training Documents Is Easy, But Foolish

Using Only Random Selection to Find Predictive Coding Training Documents Is Easy, But Foolish

The easiest way to train documents for predictive coding is simply to use random samples. It may be easy, but, as far as I am concerned, it is also defies common sense.


Drunks, DNA and Data Transfer Risk in eDiscovery

Drunks, DNA and Data Transfer Risk in eDiscovery

Data transfer risk may be minimized by automation and standards or increased by the requirement of human intervention. As automation and standards are still slowly maturing in the realm of electronic discovery technology, it seems important that legal professionals understand and properly consider the impact of potential data transfer risk as they plan, source, and conduct their electronic discovery activities.


A Problem with Text-based Technology-Assisted Review

A Problem with Text-based Technology-Assisted Review

Since the last century, text analysis has been the primary tool used to classify documents, but its durability as the tool of choice doesn’t mean that it remains the best choice.


An eDiscovery Market Size Mashup: 2013-2018 Worldwide Software and Services Overview

An eDiscovery Market Size Mashup:  2013-2018 Worldwide Software and Services Overview

Taken from a combination of public market sizing estimations as shared in leading electronic discovery reports, publications and posts over time, the following eDiscovery Market Size Mashup shares general worldwide market sizing considerations for both the software and service areas of the electronic discovery market for the years between 2013 and 2018.


The Legal Technology Future Horizons Report

The Legal Technology Future Horizons Report

International Legal Technology Association: “Released in May 2014, Legal Technology Future Horizons (LTFH) is a report that provides insights and practical ideas to inform the development of future business and IT strategies for law firms, law departments and legal technology vendors. The research, analysis and interpretation of the findings were undertaken by Fast Future Research and led by Rohit Talwar.


Four Years of Magic: The Gartner Magic Quadrant for E-Discovery Software

Four Years of Magic: The Gartner Magic Quadrant for E-Discovery Software

Published annually by Gartner, the Magic Quadrant for E-Discovery Software is a concise research report that highlights key market developments and dynamics in the field of eDiscovery and provides an comparative evaluation of leading eDiscovery software vendors. An aggregation from public domain sources of eDiscovery vendors who have been selected and evaluated as part of the annual Magic Quadrant for E-Discovery Software since 2011 is provided for your review and consideration.


The Operational Center of Gravity of eDiscovery Providers: A Look at 130+ Vendors

The Operational Center of Gravity of eDiscovery Providers: A Look at 130+ Vendors

With a desire to increase the understanding of those considering engaging with eDiscovery vendors, provided is a simple subjective analysis of the operational center of gravity for approximately 130 providers.


New Study Mired in the TAR Pit?

New Study Mired in the TAR Pit?

The hype cycle around Predictive Coding/Technology Assisted Review (PC/TAR) has focused around court acceptance and actual review cost savings. The last couple weeks have seen a bit of blogging kerfuffle over the conclusions, methods and implications of the new study by Gordon Cormack and Maura Grossman, “Evaluation of Machine-Learning Protocols for Technology-Assisted-Review in Electronic Discovery”. Pioneering analytics guru Herbert L. Roitblat of OrcaTec has published two blogs (first and second links) critical of the study and its conclusions. As much as I love a spirited debate and have my own history of ‘speaking truth’ in the public forum, I can’t help wondering if this tussle over Continuous Active Learning (CAL) vs. Simple Active Learning (SAL) has lost view of the forest while looking for the tallest tree in it.


Considering Proportionality: Electronic Discovery Best Practices

Considering Proportionality: Electronic Discovery Best Practices

Proportionality. Parties are expected to use reasonable, good faith and proportional efforts to preserve, identify and produce relevant information. This includes identifying appropriate limits to discovery, including limits on custodians, identification of relevant subject matter, time periods for discovery and other parameters to limit and guide preservation and discovery issues.


Electronic Discovery: Are We Competent?

Electronic Discovery: Are We Competent?

The State Bar of California has issued perhaps the country’s most straightforward and candid directive to litigators to learn the ins and outs of electronic discovery (e-discovery). In a proposed formal opinion, it states, “Not every litigated case ultimately involves e-discovery; however, in today’s technological world, almost every litigation matter potentially does.”


Measuring Text Bias/Tunnel Vision in Content Search and ECM Systems

Measuring Text Bias/Tunnel Vision in Content Search and ECM Systems

Calculating MTV Ratio and True Recall Many tools designed to search or classify documents as part of the enterprise content management and electronic discovery functions in organizations depend on having accurate textual representations of the documents being analyzed or indexed. They have text-tunnel vision – they cannot “see” non-textual objects. If the only documents of interest were text-based, that could be an excusable shortcoming, depending on what tasks were being performed. However, there are some collections where as many as one-half of the documents of interest contain no textual representation.


The Science of Comparing Learning Protocols – Thoughts on the Cormack & Grossman Article

The Science of Comparing Learning Protocols – Thoughts on the Cormack & Grossman Article

In this post, I want to focus more on the science in the Cormack and Grossman article. It seems that several flaws in their methodology render their conclusions not just externally invalid — they don’t apply to systems that they did not study, but internally invalid as well — they don’t apply to the systems they did study.


What’s In A Name? Information Governance Finds A Home On The EDRM

What’s In A Name? Information Governance Finds A Home On The EDRM

When EDRM , the organization that created the Electronic Discovery Reference Model , launched its Information Governance Reference Model (IGRM) I wondered how long it would take for this day to come. The wait is over. Information governance (IG) has taken its place on the EDRM. In this post I will take a look at the changes and consider whether they have gone too far, have got it just right, or maybe have a little room for more tweaks.


A Short Comment on the Technology-Assisted Review Problem

A Short Comment on the Technology-Assisted Review Problem

Provided for your review is a short but important comment from a recent LinkedIn Group discussion on the evaluation of machine learning protocols for technology-assisted review (TAR).  The comment introduced a problems with TAR in the fact that it is limited text.


Roitblat Challenges Competitor Claims On Random Sampling Effectiveness in Predictive Coding

Roitblat Challenges Competitor Claims On Random Sampling Effectiveness in Predictive Coding

I don’t usually comment on competitors’ claims, but I thought that I needed to address some potentially serious misunderstandings that could come out of Cormack and Grossman’s latest article, “ Evaluation of Machine-Learning Protocols for Technology-Assisted Review in Electronic Discovery .” Although they find in this paper that an active learning process is superior to random sampling, it would be a mistake to think their conclusions would apply to all random sampling predictive coding regimens.