Provided below for your consideration and use are the in-progress results of the One-Question Provider Implementation Survey launched by ComplexDiscovery on 3/3/13.

The goal of the survey is to provide a specific and detailed look at the use of the technology-assisted review feature of predictive coding among leading eDiscovery providers as represented by those providers. The specific technologies highlighted in the one-question survey include:

- Active Learning
- Language Modeling
- Latent Semantic Analysis
- Linguistic Analysis
- Naive Bayesian Classifier
- Nearest Neighbor Classifier
- Probabilistic Latent Semantic Analysis
- Relevance Feedback
- Support Vector Machine
- Other (Provider Machine Learning Approach Not Listed)

The in-progress results consist of survey answers harvested directly from the online survey form as completed by provider representatives. Additional survey responses from the eDiscovery provider community will be added to this listing as they are completed.

*Additional responders are welcome and encouraged. Click here to go to survey.*

*Note: The running results of a previously presented general survey on eDiscovery provider use of predictive coding are available for review(2) (click here for survey results). The initial 120-second survey(3) (click here for initial survey form) contained six high level questions related to technology development, offering integration, machine learning approach and sampling approach of providers in relation to predictive coding. The following one-question provider implementation survey was designed to build on the machine learning question from the initial general survey by providing additional and important layers of detail.*

Updated 9/16/2013

- @Legal
- Altep
- BIA
- Catalyst Repository Systems
- Content Analyst
- D4
- Daegis
- Driven
- Huron Legal
- kCura
- Kroll Ontrack
- Liquid Litigation Management (LLM)
- Nuix
- Orange Legal Technologies
- OrcaTec
- Prolorem
- Recommind
- Servient
- Symantec/Clearwell
- TCDI
- UBIC
- Valora Technologies
- Xerox Litigation Services

**Nearest Neighbor Classifier:**A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.**Support Vector Machine:**A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

**Active Learning:**An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified. In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line. The line is moved if any of the presented documents has been misclassified.**Latent Semantic Analysis:**A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

**Active Learning:**An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified. In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line. The line is moved if any of the presented documents has been misclassified.**Language Modeling**: A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents. Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.**Latent Semantic Analysis:**A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.**Linguistic Analysis:**Linguists examine responsive and non-responsive documents to derive classification rules that maximize the correct classification of documents.**Naïve Bayesian Classifier:**A system that examines the probability that each word in a new document came from the word distribution derived from trained responsive document or from trained non-responsive documents. The system is naïve in the sense that it assumes that all words are independent of one another.**Nearest Neighbor Classifier:**A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.**Probabilistic Latent Semantic Analysis:**A second mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. PLSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.**Relevance Feedback:**A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.**Support Vector Machine:**A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

**Nearest Neighbor Classifier:**A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.

**Active Learning:**An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified. In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line. The line is moved if any of the presented documents has been misclassified.**Latent Semantic Analysis:**A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

**Active Learning:**An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified. In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line. The line is moved if any of the presented documents has been misclassified.**Support Vector Machine:**A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

**Active Learning:**An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified. In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line. The line is moved if any of the presented documents has been misclassified.**Probabilistic Latent Semantic Analysis:**A second mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. PLSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

**Active Learning:**An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified. In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line. The line is moved if any of the presented documents has been misclassified.**Latent Semantic Analysis:**A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.**Nearest Neighbor Classifier:**A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.**Relevance Feedback:**A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.

**Nearest Neighbor Classifier:**A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.

**Latent Semantic Analysis:**A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.**Nearest Neighbor Classifier:**A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.**Relevance Feedback:**A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.

**Active Learning:**An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified. In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line. The line is moved if any of the presented documents has been misclassified.**Other: Logistical Regression**is well-accepted by computer science and information retrieval communities as a sound statistical modeling approach for data analysis and predictive modeling. Logistic regression is a form of supervised learning, in that a logistic regression model is produced by “training” on a set of documents that have been manually categorized. Once trained, the logistic regression model can be used to estimate the probability that a new document belongs to each of the possible categories. The model can use both content features such as words and phrases, and metadata features such as custodian, date, file type, and contextual information. They can be applied to data sets with millions of documents and billions of content features, and are one of the most effective approaches in a wide range of text and data mining tasks.

**Active Learning:**An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified. In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line. The line is moved if any of the presented documents has been misclassified.**Latent Semantic Analysis:**A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

**Naïve Bayesian Classifier:**A system that examines the probability that each word in a new document came from the word distribution derived from trained responsive document or from trained non-responsive documents. The system is naïve in the sense that it assumes that all words are independent of one another.

**Language Modeling**: A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents. Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.

**Language Modeling**: A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents. Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.

**Support Vector Machine:**A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

**Probabilistic Latent Semantic Analysis:**A second mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. PLSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.**Support Vector Machine:**A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

**Active Learning:**An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified. In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line. The line is moved if any of the presented documents has been misclassified.**Language Modeling**: A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents. Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.**Latent Semantic Analysis:**A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.**Linguistic Analysis:**Linguists examine responsive and non-responsive documents to derive classification rules that maximize the correct classification of documents.**Naïve Bayesian Classifier:**A system that examines the probability that each word in a new document came from the word distribution derived from trained responsive document or from trained non-responsive documents. The system is naïve in the sense that it assumes that all words are independent of one another.**Nearest Neighbor Classifier:**A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.**Relevance Feedback:**A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.**Support Vector Machine:**A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

**Active Learning:**An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified. In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line. The line is moved if any of the presented documents has been misclassified.**Relevance Feedback:**A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.**Support Vector Machine:**A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

**Latent Semantic Analysis:**A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

**Active Learning:**An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified. In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line. The line is moved if any of the presented documents has been misclassified.**Language Modeling**: A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents. Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.**Latent Semantic Analysis:**A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.**Nearest Neighbor Classifier:**A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.**Probabilistic Latent Semantic Analysis:**A second mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. PLSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.**Relevance Feedback:**A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.**Support Vector Machine:**A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

**Other: Probabilistic Hierarchical Context-Free Grammars**approach to machine learning.

**Active Learning:**An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified. In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line. The line is moved if any of the presented documents has been misclassified.**Language Modeling**: A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents. Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.**Linguistic Analysis:**Linguists examine responsive and non-responsive documents to derive classification rules that maximize the correct classification of documents.**Probabilistic Latent Semantic Analysis:**A second mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. PLSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.**Relevance Feedback:**A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.**Support Vector Machine:**A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

**Active Learning:** An iterative process that presents for reviewer judgment those documents that are most likely to be misclassified. In conjunction with Support Vector Machines, it presents those documents that are closest to the current position of the separating line. The line is moved if any of the presented documents has been misclassified.

- Altep
- BIA
- Content Analyst
- D4
- Daegis
- Driven
- Kroll Ontrack
- Liquid Litigation Management (LLM)
- Servient
- Symantec/Clearwell
- UBIC
- Xerox Litigation Services

**Language Modeling**: A mathematical approach that seeks to summarize the meaning of words by looking at how they are used in the set of documents. Language modeling in predictive coding builds a model for word occurrence in the responsive and in the non-responsive documents and classifies documents according to the model that best accounts for the words in a document being considered.

- BIA

- Orange Legel Technologies
- OrcaTec
- Servient
- UBIC
- Xerox Litigation Services

**Latent Semantic Analysis:** A mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. LSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

- Altep
- BIA

- Content Analyst
- Driven
- kCura
- Liquid Litigation Management (LLM)
- Servient
- TCDI
- UBIC

**Linguistic Analysis: **Linguists examine responsive and non-responsive documents to derive classification rules that maximize the correct classification of documents.

- BIA

- Servient
- Xerox Litigation Services

**Naïve Bayesian Classifier:** A system that examines the probability that each word in a new document came from the word distribution derived from trained responsive document or from trained non-responsive documents. The system is naïve in the sense that it assumes that all words are independent of one another.

- BIA
- Nuix
- Servient

**Nearest Neighbor Classifier:** A classification system that categorizes documents by finding an already classified example that is very similar (near) to the document being considered. It gives the new document the same category as the most similar trained example.

- @Legal
- BIA

- Catalyst Repository Systems
- Driven
- Huron Legal
- kCura
- Servient
- UBIC

**Probabilistic Latent Semantic Analysis: ** A second mathematical approach that seeks to summarize the meaning of words by looking at the documents that share those words. PLSA builds up a mathematical model of how words are related to documents and lets users take advantage of these computed relations to categorize documents.

- BIA

- Daegis
- Recommind
- UBIC
- Xerox Litigation Services

**Relevance Feedback:** A computational model that adjusts the criteria for implicitly identifying responsive documents following feedback by a knowledgeable user as to which documents are relevant and which are not.

- BIA
- Driven
- kCura
- Servient
- Symantec/Clearwell
- UBIC
- Xerox Litigation Services

**Support Vector Machine: ** A mathematical approach that seeks to find a line that separates responsive from non-responsive documents so that, ideally, all of the responsive documents are on one side of the line and all of the non-responsive ones are on the other side.

- @Legal
- BIA

- D4
- Prolorem
- Recommind
- Servient
- Symantec/Clearwell
- UBIC
- Xerox Litigation Services

- Kroll Ontrack

- Valora Technologies

End of Survey Results

This entry was posted on Tuesday, March 5th, 2013 at 11:54 am. It is filed under Blog Slider, chronology, original, Technology-Assisted Review and tagged with electronic discovery, research, vendors. You can follow any responses to this entry through the RSS 2.0 feed.

Comments are closed.

- IT Due Diligence: Don’t Let IT Be Ignored
- Understanding Precision and Recall
- Newsletter Update: The Week’s Five Great Reads – Big Data Edition
- If a Tree Falls and No One is Around: Influencing Client Acceptance of New Technologies (Cartoon and Clip)
- The Problem with Using Facebook to Fight Crime
- New Video on Best Practices: Technology Assisted Review
- Predictive Coding: A Non All Inclusive Working List of Technologies
- Now How Do You Influence Customer Behavior? Emotions and Logic
- Why Math Matters: Random Sampling for Binomial Classification of Documents
- Can You Really Compete in TREC Retroactively?

As lawyers, we hear a lot about the technological advances in e-discovery and information governance. How do you describe the current state of e-discovery from an opportunity and growth perspective, and how does this market opportunity impact the pulse rate of mergers, acquisitions, and investments? For lawyers purchasing e-discovery packages, there are several types of vendors and pricing models, and they need to be asking the right questions. What does the data governance solution need to do, how much does it cost, what are the time constraints, and how complex is the system?

Since its 2007 introduction, kCura’s Relativity product has become one of the world’s leading attorney review platforms. One of the elements of Relativity’s strong growth and marketplace acceptance has been kCura’s focus on and support of partnerships. Provided as a by-product of review platform research and presented in the form of a simple and sortable table is an aggregation of kCura Premium Hosting Partners and Consulting Partners.

F-scores are often inappropriately interpreted as measures of review quality when evaluating predictive coding results. To get a better understanding of how an application of predictive coding has performed, the component elements of the f-score — precision and recall — should be reviewed. But what do precision and recall scores indicate and how do they relate?

Technology assisted review (TAR), also known as predictive coding and computer assisted review, has become a frequently used tool to complete large document reviews quickly and cost efficiently. The promise of fast, accurate computer-assisted coding as a practical solution to increasingly massive collections is encouraging, but understanding various vendor approaches can be confusing and overwhelming. In many cases, there is little, if any, information about how a specific TAR methodology works, creating potential defensibility blind spots and jeopardizing the progress of your case. How can you trust or account for the results of a mystery process? Alternatively, if a methodology is fully disclosed, case teams can evaluate, explain, and justify outcomes with confidence.

I recently encountered a marketing piece where a vendor claimed that their tests showed their predictive coding software demonstrated favorable performance compared to the software tested in the 2009 TREC Legal Track for Topic 207 (finding Enron emails about fantasy football). I spent some time puzzling about how they could possibly have measured their performance when they didn’t actually participate in TREC 2009.

ComplexDiscovery | Creative Commons Attribution 4.0 International

Newsletter Sign Up and Social Media Channels

With the growing use of predictive coding in the legal arena today, it appears that it is increasingly more important...