We are asked on a regular basis why organizations should use visual classification to manage their electronic and paper repositories for various information governance initiatives. Here’s a summary of how we respond in the form of a top ten reasons list.
In my previous post, I found that relevance and uncertainty selection needed similar numbers of document relevance assessments to achieve a given level of recall. I summarized this by saying the two methods had similar cost. The number of documents assessed, however, is only a very approximate measure of the cost of a review process, and richer cost models might lead to a different conclusion.
One distinction that is sometimes made is between the cost of training a document, and the cost of reviewing it. It is often assumed that training is performed by a subject-matter expert, whereas review is done by more junior reviewers. The subject-matter expert costs more than the junior reviewers—let’s say, five times as much. Therefore, assessing a document for relevance during training will cost more than doing so during review.
Daily we read, see and hear more and more about technology developments that impact the areas of information governance and electronic discovery. This week’s cartoon and clip features a unique look at innovative thinking in these critical areas (cartoon) and quick reference links to six interesting blogs that regularly highlight the need to truly think about these critical areas (clip).
A critical metric in Technology Assisted Review (TAR) is recall, which is the percentage of relevant documents actually found from the collection. One of the most compelling reasons for using TAR is the promise that a review team can achieve a desired level of recall (say 75% of the relevant documents) after reviewing only a small portion of the total document population (say 5%). The savings come from not having to review the remaining 95% of the documents.
Try as I might to make it foolproof, downloading Gmail using IMAP and Outlook is tricky. Happily since my post, the geniuses at Google introduced a truly simple, no-cost way to collect Gmail and other Google content for preservation and portability.
Emails and their attachments represent an increasingly significant portion of ESI (Electronically Stored Information) collections and for good reason, too. The hundreds of billions of emails that are sent daily paint a comprehensive picture of our personal and professional lives, so it is no wonder that litigators must thoroughly and effectively review these collections for relevant case material. All too often, the “smoking gun” is hiding in .msg files and their attachments, but the peculiarities of email format can make this key evidence difficult to find, process for review, search, and produce.
Joint press briefing by Neelie KROES, Vice-President of the EC in charge of Digital Agenda, and Jan SUNDELIN, CEO of TIE Kinetix. See the original brief posting at: LIVE Launch of the Big Data Public-Private Partnership Key Facts to Share@EU_Commission LIVE Launch of the #BigData Public-Private Partnership Tweet Buffer
I thought Microsoft would have gone after somebody else. Pundits claim there were other contenders. The actual acquisition of Equivio by somebody is no surprise. From what the VCs have told me, the Equivio book has been out on the street for awhile. It will be interesting if anyone else makes a play now. But as Ralph [Losey] said: is Microsoft really serious about playing in “our” legal sandbox?
On Oct. 7, 2014, the Wall Street Journal reported that Microsoft had signed a letter of intent to buy what they called an Israel-based text analysis startup company named Equivio . The mainstream business press has virtually no understanding of the e-discovery industry, nor anything having to do with litigation support. They also seem to have no real grasp of what kind of software Equivio and others like it in the industry have created.
Daily we read, see and hear more and more about the health, legal and business developments in the pharmaceutical industry. This week’s cartoon and clip features a unique look at the retail sales impact of the pharmaceutical industry (cartoon) and a quick reference link to one of the most informational and timely resources on the pharmaceutical industry, FiercePharma (clip).
Microsoft Corp. (MSFT) has signed a letter of intent to acquire Israeli text-analysis startup Equivio, said a person familiar with the company’s plans, as the software maker bulks up products for analyzing data.
The companies listed below are the subject of an ongoing and unresolved FCPA-related investigation. The names are current through September 30, 2014. The entries are based on disclosures in SEC filings or credible news reports or both.
Near duplicate identification, or ‘NearDup’, is a critically important eDiscovery function that can drastically increase the speed and quality of your review by grouping similar documents, maintaining email threads, retrieving unmarked ‘hot’ documents, and preventing the inadvertent release of critical privileged information. As document collections continue to grow, so does the risk of missing key documents, inconsistently coding productions, and releasing privileged information.
Published on September 23, 2014, the new Gartner Market Guide for File Analysis Software (G00262949) provides information technology and business professionals with information and insight into more efficient, less costly and less risky ways to manage what is generally regarded as unstructured data through the use of file analysis software.
Daily we read, see and hear more and more about the latest privacy and data security breaches in consumer, corporate and governmental arenas. This week’s cartoon and clip features a unique approach to ensuring personal data security (cartoon) and a quick reference link to one of the most informational and timely resources on privacy and data security, the LXBN Privacy & Data Security Blog Channel (clip).
By K&L Gates Dynamo Holdings Ltd. P’ship v. Comm’r of Internal Revenue, Nos. 2685-11, 8393-12 (T.C. Sept. 17, 2014) In this case, the court approved petitioners’ (Dynamo Holdings Ltd. Partnership et. al.) use of predictive coding to identify potentially responsive and privileged data contained on two backup tapes, despite respondent’s (Commissioner of Internal Revenue) objection […]
By Benedict Hur and Matthew Werdegar The Federal Rules of Civil Procedure are supposed to be “construed and administered to secure the just, speedy and inexpensive determination of every action and proceeding.” Yet, as anyone who has ever been tasked with handling discovery in complex litigation knows, the judicial system has struggled to reconcile this overarching goal […]
Documents that have ongoing business or regulatory value are deemed “records,” and are retained. Emails that have records as attachments can be viewed as providing context to those records and hence inherit the retention policies associated with them. This automated payload analysis can make classification decisions for a large percentage of those emails that have attachments.
Extract: Choosing an e-discovery solution means addressing several interconnected issues. Product demos can be impressive, but don’t be fooled: a tool’s features can be the least important factor for you to consider. KPMG Canada’s Dominic Jaar, partner and national practice leader, information management services, and David Sharpe, manager of e-discovery, offer some key questions you should endeavour to answer while exploring solutions.
The IRS seems inherently incapable of finding emails. The most famous incident that everyone has heard about, and many have complained about, is the loss of emails of key witnesses in a Congressional investigation of the IRS tea party targeting scandal .
Writers in the information management space often speak of structured vs. unstructured data and then analyze documents as if they were “unstructured.” However, when documents are clustered by visual similarity, they are actually fairly structured within clusters, e.g., invoices, letters, and emails each have recurring attributes or data elements located in generally the same place in the documents in that cluster.
It is actually more accurate to say that documents are heterogeneously structured – once they are clustered into groups of visually-similar documents, there are recurring attributes or data elements in that group or cluster.
As the volume of discoverable data continues to increase, creating functional fact and issue timelines is more important than ever. During the early stages of litigation, timelines can help you develop eDiscovery strategies, identify collection sources, predict disputed facts and issues, and create a preliminary chronological relationship between case events. And as your case progresses, timelines can assist you when preparing for depositions, carrying out motion practice, and conducting trial, as well. Associating key documents with timeline events allows you to track the truly critical data in the increasingly murky sea of irrelevant emails and documents subject to discovery collection. Most importantly, a well managed fact timeline enables you to present the best case possible.
By William Webber My previous post described in some detail the conditions of finite population annotation that apply to e-discovery. To summarize, what we care about (or at least should care about) is not maximizing classifier accuracy in itself, but minimizing the total cost of achieving a target level of recall. The predominant cost in […]
Given the increasing prevalence of technology assisted review in e-discovery, it seems hard to believe that it was just 19 months ago that TAR received its first judicial endorsement. That endorsement came, of course, from U.S. Magistrate Judge Andrew J. Peck in his landmark ruling in Moore v. Publicis Groupe , 287 F.R.D. 182 (S.D.N.Y. 2012), adopted sub nom. Moore v. Publicis Groupe SA , No. 11 Civ. 1279 (ALC)(AJP), 2012 WL 1446534 (S.D.N.Y. Apr. 26, 2012), in which he stated, “This judicial opinion now recognizes that computer-assisted review is an acceptable way to search for relevant ESI in appropriate cases.”