ARCHIVED CONTENT
You are viewing ARCHIVED CONTENT released online between 1 April 2010 and 24 August 2018 or content that has been selectively archived and is no longer active. Content in this archive is NOT UPDATED, and links may not function.By John Tredennick and Mark Noel
Maura Grossman and Gordon Cormack just released another blockbuster article, “Comments on ‘The Implications of Rule 26(g) on the Use of Technology-Assisted Review,’” 7 Federal Courts Law Review 286 (2014). The article was in part a response to an earlier article in the same journal by Karl Schieneman and Thomas Gricks, in which they asserted that Rule 26(g) imposes “unique obligations” on parties using TAR for document productions and suggested using techniques we associate with TAR 1.0 including:
Training the TAR system using a random “seed” or “training” set as opposed to one relying on judgmental sampling, which “may not be representative of the entire population of electronic documents within a given collection.”
From the beginning, we have advocated a TAR 2.0 approach that uses judgmental seeds (selected by the trial team using all techniques at their disposal to find relevant documents). Random seeds are a convenient shortcut to approximating topical coverage, especially when one doesn’t have the algorithms and computing resources to model the entire document collection. But they are neither the best way to train a modern TAR system nor the only way eliminate bias and ensure full topical coverage. We have published several research papers and articles showing that documents selected via continuous active learning and contextual diversity (active modeling of the entire document set) consistently beat training documents selected at random.