Comparing 179 Machine Learning Categorizers on 121 Data Sets

Extract from article by Herbert L. Roitblat, Ph.D.

It is often argued that the algorithm used for machine learning is less important than the amount of data used to train the algorithm (e.g., Domingos, 2012; “More data beats a cleverer algorithm”).  In a monumental study, Fernández-Delgado and colleagues tested 179 machine learning categorizers on 121 data sets. They found that a large majority of them, were essentially identical in their accuracy. In fact, 121 of them (that’s a coincidence) were within ±5 percentage points of one another averaging all of the data sets.


Additional Reading:

cacm12

delgado14a