Big data and the “internet of things” — in which everyday objects can send and receive data — promise revolutionary change to management and society. But their success rests on an assumption: that all the data being generated by internet companies and devices scattered across the planet belongs to the organizations collecting it. What if it doesn’t?
In my previous post, I found that relevance and uncertainty selection needed similar numbers of document relevance assessments to achieve a given level of recall. I summarized this by saying the two methods had similar cost. The number of documents assessed, however, is only a very approximate measure of the cost of a review process, and richer cost models might lead to a different conclusion.
One distinction that is sometimes made is between the cost of training a document, and the cost of reviewing it. It is often assumed that training is performed by a subject-matter expert, whereas review is done by more junior reviewers. The subject-matter expert costs more than the junior reviewers—let’s say, five times as much. Therefore, assessing a document for relevance during training will cost more than doing so during review.
Daily we read, see and hear more and more about technology developments that impact the areas of information governance and electronic discovery. This week’s cartoon and clip features a unique look at innovative thinking in these critical areas (cartoon) and quick reference links to six interesting blogs that regularly highlight the need to truly think about these critical areas (clip).
A critical metric in Technology Assisted Review (TAR) is recall, which is the percentage of relevant documents actually found from the collection. One of the most compelling reasons for using TAR is the promise that a review team can achieve a desired level of recall (say 75% of the relevant documents) after reviewing only a small portion of the total document population (say 5%). The savings come from not having to review the remaining 95% of the documents.
Try as I might to make it foolproof, downloading Gmail using IMAP and Outlook is tricky. Happily since my post, the geniuses at Google introduced a truly simple, no-cost way to collect Gmail and other Google content for preservation and portability.
Daily we read, see and hear more and more about the health, legal and business developments in the pharmaceutical industry. This week’s cartoon and clip features a unique look at the retail sales impact of the pharmaceutical industry (cartoon) and a quick reference link to one of the most informational and timely resources on the pharmaceutical industry, FiercePharma (clip).
Published on September 23, 2014, the new Gartner Market Guide for File Analysis Software (G00262949) provides information technology and business professionals with information and insight into more efficient, less costly and less risky ways to manage what is generally regarded as unstructured data through the use of file analysis software.
Daily we read, see and hear more and more about the latest privacy and data security breaches in consumer, corporate and governmental arenas. This week’s cartoon and clip features a unique approach to ensuring personal data security (cartoon) and a quick reference link to one of the most informational and timely resources on privacy and data security, the LXBN Privacy & Data Security Blog Channel (clip).
By K&L Gates Dynamo Holdings Ltd. P’ship v. Comm’r of Internal Revenue, Nos. 2685-11, 8393-12 (T.C. Sept. 17, 2014) In this case, the court approved petitioners’ (Dynamo Holdings Ltd. Partnership et. al.) use of predictive coding to identify potentially responsive and privileged data contained on two backup tapes, despite respondent’s (Commissioner of Internal Revenue) objection […]
By Benedict Hur and Matthew Werdegar The Federal Rules of Civil Procedure are supposed to be “construed and administered to secure the just, speedy and inexpensive determination of every action and proceeding.” Yet, as anyone who has ever been tasked with handling discovery in complex litigation knows, the judicial system has struggled to reconcile this overarching goal […]
The IRS seems inherently incapable of finding emails. The most famous incident that everyone has heard about, and many have complained about, is the loss of emails of key witnesses in a Congressional investigation of the IRS tea party targeting scandal .
In Dynamo Holdings v. Comm’r , the IRS Commissioner sought to compel production of the contents of backup tapes containing at least several million documents. It objected to the producing parties’ request to use predictive coding to review them, calling it an “unproven technology.”
Keeping up with the many comments and commentators in the data discovery and governance blogosphere can be quite challenging given the multitude of information, opinion and news blogs. This week’s cartoon and clip features a unique challenge to today’s bloggers (cartoon) and a non-all inclusive running list of approximately 30 recent and relevant eDiscovery and information governance related blog posts (clip).
Bad things tend to happen when lawyers delegate e-discovery responsibility to their clients. As all informed lawyers know, lawyers have a duty to actively supervise their client’s preservation. They cannot just turn a blind eye; just send out written notices and forget it. Lawyers have an even higher duty to manage discovery, including search and production of electronic evidence. They cannot just turn e-discovery over to a client and then sign the response to the request for production. The only possible exception proves the rule. If a client has in-house legal counsel, and if they appear of record in the case, and if the in-house counsel signs the discovery response, then, and only then, is outside counsel (somewhat) off the hook. Then they can lay back, a little bit, but, trust me, this almost never happens.
Keeping up with innovation in eDiscovery can be quite challenging given the various approaches, commentators and providers weighing in on each real or perceived innovation. This week’s cartoon and clip features a strategic approach to driving innovation (cartoon) and a non-all inclusive running listing of mergers, acquisitions and investments in the eDiscovery arena (clip).
In the absence of a black swan recently happening to you and your organization, how can you convince the powers that be that they should take some preventive and/or precautionary course of action to stave off a subsequent disaster? These questions have direct relevance to the matter of “selling” information governance to the C-suite in our increasingly Big Data world.
Keeping up with the promises and problems of Technology-Assisted Review (TAR) can be quite a challenge given the amount of writers and writing on the subject. This week’s cartoon and clip features a way to make the challenges of TAR look smaller (cartoon) and a non-all inclusive listing of recent articles on the topic of TAR (clip).
Features Portrait of Laura Zubulake by Anita Kunz When Laura Zubulake first brought her employment discrimination lawsuit to attorney James Batson in 2001, neither of them thought the case would make history. Neither did U.S. District Judge Shira Scheindlin, who presided over the case in the Southern District of New York. In fact, Scheindlin has mentioned many times that Zubulake’s lawsuit seemed like a "garden-variety employment discrimination case." Zubulake didn’t get a promotion she thought she had earned at the global financial services firm UBS Warburg, filed a complaint with human resources and suddenly found herself at odds with [...]
Based on a website review of this year’s Inc. 5000, the following list provides a quick, non-all inclusive reference of some of the eDiscovery enablers that have been included in the 2014 list. The sortable list includes the provider’s name, 2014 Inc. 5000 ranking (#), three year revenue growth (%), 2013 revenue ($) and industry categorization.
Continuous Active Learning for Technology Assisted Review (How it Works and Why it Matters for E-Discovery)
Grossman and Cormack concluded that CAL demonstrated superior performance over SPL and SAL, while avoiding certain other problems associated with these traditional TAR 1.0 protocols. Specifically, in each of the eight case studies, CAL reached higher levels of recall (finding relevant documents) more quickly and with less effort that the TAR 1.0 protocols.
Since its 2007 introduction, kCura’s Relativity product has become one of the world’s leading attorney review platforms. One of the elements of Relativity’s strong growth and marketplace acceptance has been kCura’s focus on and support of partnerships. Provided as a by-product of review platform research and presented in the form of a simple and sortable table is an aggregation of kCura Premium Hosting Partners and Consulting Partners.
A random only search method for predictive coding training documents is ineffective. The same applies to any other training method if it is applied to the exclusion of all others. Any experienced searcher knows this.
The results presented here do not support the commonly advanced position that seed sets, or entire training sets, must be randomly selected [19, 28] [contra 11]. Our primary implementation of SPL, in which all training documents were randomly selected, yielded dramatically inferior results to our primary implementations of CAL and SAL, in which none of the training documents were randomly selected.
Multimodal Search for Predictive Coding Training Documents and the Folly of Random Search – Part Two
Cormack and Grossman set up an ingenious experiment to test the effectiveness of three machine learning protocols. It is ingenious for several reasons, not the least of which is that they created what they call an “evaluation toolkit” to perform the experiment. They have even made this same toolkit, this same software, freely available for use by any other qualified researchers. They invite other scientists to run the experiment for themselves. They invite open testing of their experiment. They invite vendors to do so too, but so far there have been no takers.