ARCHIVED CONTENT
You are viewing ARCHIVED CONTENT released online between 1 April 2010 and 24 August 2018 or content that has been selectively archived and is no longer active. Content in this archive is NOT UPDATED, and links may not function.Extract from article by Herbert L. Roitblat, Ph.D
There may be other potential factors that could contribute to the poor performance on this predictive coding task. It is important to keep in mind, that even the most powerful predictive coding system is still just a tool used by humans.
The power of a categorization system, such as predictive coding, is its ability to separate the document classes from one another (e.g., responsive from the non-responsive documents). For a system with any amount of power, specific levels of Recall can be achieved by adjusting the criterion of what one calls responsive to accept more or fewer true positives and therefore more or fewer false positives. By itself, achieving high levels of Recall, therefore, does not mean a powerful system because when high levels of Recall are accompanied by high levels of false positives, there is very little separation at all. A more powerful system is one that increases the proportion of truly responsive documents more quickly than the proportion of false positives as this criterion is lowered. A more powerful system will achieve high Recall at the same time as it achieves few false positives. In this light, the system used by Dynamo Holdings was not very powerful. Rather than separating the responsive from non-responsive, it simply provided both.
It is important to remember that a system, particularly in eDiscovery, consists not just of the software used to implement the machine learning, but also of the training examples and other methods used. People are a critical part of predictive coding system and by some measures, they are the most error-prone part.
Predictive coding is not magic. You don’t get something for nothing. What you do get is a tool that makes the most out of relatively small amounts of effort. Unsupervised, the computer has no way to distinguish what is legally important from what is not, it still requires human judgment to guide it. The computer then amplifies that judgment, but can amplify poor judgment as well as good judgment.
Effective predictive coding requires good technology, good methods for applying that technology, and good judgment to guide the technology. At least one of those appears to have been missing in this case.
Read the complete article at Understanding Dynamo Holdings Predictive Coding