Clustering users and items is an interesting technique when you work with big data sets. Due to the complexity of the calculations, and RAM consumption, it’s really useful to work with smaller sets. When you cluster users (or items) what you get are groups of similar users, so in addtion to have a smaller data set, the opinions of this users will be closer among them, and the recommendations may improve.
django-recommender implements 2 methods for clustering:
- def cluster_users(self, users, items, cluster_count=2):
- def cluster_items(self, users, items, cluster_count=2):
Both methods use django-voting for its calculations, because it must be consistent with the other methods. If we use a criteria for getting people similarity, we must respect it when grouping people in clusters.
The result of both methods is a list of sublists, containing the object ids of the elements in each cluster. For instance, you have 10 user (ids from 1 to 10), and want to have 4 clusters, a possible result may be:
user_cluster = [[2,9,1],,[5,10,8,6],[4,7]]
Bear in mind the result can vary, because there is a random conponent in first centroids calculation, and more than one solution is possible for the problem.