Episode 193: Apache Mahout

Filed in Episodes by on April 22, 2013 2 Comments
Facebooktwitterlinkedin

Recording Venue: Skype

Guest: Grant Ingersoll

Grant IngersollGrant Ingersoll, founder of the Mahout project, talks with Robert about machine learning.   The conversation begins with an introduction to machine learning and the forces driving the adoption of this technique. Grant explains the three main use cases, similarity metrics, supervised versus unsupervised learning, and the use of large data sets. He also provides a brief history of the Mahout project and the connection between Mahout and Hadoop.  The remainder of the episode dives into the three main uses cases: recommendations, clustering, and classification. Grant and Robert discuss each use case, illustrating with examples and a typical algorithm. Recommendation is a technique for identifying items that a user would like to buy, use, or otherwise consume based on the preferences of similar users. Clustering is the partitioning of the data set into a small number of sets of similar items.  Classification is the assignment of new items to a small number of existing sets.

Tags: , , , , , , , , , , ,