{"id":342,"date":"2017-01-31T23:21:34","date_gmt":"2017-01-31T15:21:34","guid":{"rendered":"http:\/\/vinta.ws\/code\/?p=342"},"modified":"2026-02-18T01:20:36","modified_gmt":"2026-02-17T17:20:36","slug":"recommender-system-memory-based-collaborative-filtering","status":"publish","type":"post","link":"https:\/\/vinta.ws\/code\/recommender-system-memory-based-collaborative-filtering.html","title":{"rendered":"Recommender System: Collaborative Filtering \u5354\u540c\u904e\u6ffe\u63a8\u85a6\u6f14\u7b97\u6cd5"},"content":{"rendered":"<p>dataset \u6703\u662f m \u500b\u7528\u6236\u5c0d n \u500b\u7269\u54c1\u7684\u8a55\u5206 utility matrix<br \/>\n\u56e0\u70ba\u901a\u5e38\u53ea\u6709\u90e8\u5206\u7528\u6236\u548c\u90e8\u4efd\u7269\u54c1\u6703\u6709\u8a55\u5206\u8cc7\u6599<br \/>\n\u6240\u4ee5\u662f\u4e00\u500b sparse matrix\uff08\u7a00\u758f\u77e9\u9663\uff09<br \/>\n\u76ee\u6a19\u662f\u5229\u7528\u9019\u4e9b\u7a00\u758f\u7684\u8cc7\u6599\u53bb\u9810\u6e2c\u51fa\u7528\u6236\u5c0d\u4ed6\u9084\u6c92\u8a55\u5206\u904e\u7684\u7269\u54c1\u7684\u8a55\u5206<br \/>\n\u9664\u4e86\u8a55\u5206\u4e4b\u5916\uff0c\u4e5f\u53ef\u80fd\u662f\u559c\u6b61\uff08\u548c\u4e0d\u559c\u6b61\uff09\u3001\u8cfc\u8cb7\u3001\u700f\u89bd\u4e4b\u985e\u7684\u6578\u64da<br \/>\n\u53c8\u5206\u6210\u4e3b\u52d5\u8a55\u5206\u548c\u88ab\u52d5\u8a55\u5206<\/p>\n<p>CF \u7684\u7f3a\u9ede\uff1a<\/p>\n<ul>\n<li>\u5982\u679c\u6c92\u6709\u7528\u6236\u7684\u6b77\u53f2\u6578\u64da\u5c31\u6c92\u8fa6\u6cd5\u505a\u4efb\u4f55\u63a8\u85a6<\/li>\n<li>\u4ee5\u53ca\u7121\u8ad6 user-based \u6216 item-based \u90fd\u9700\u8981\u6d88\u8017\u5927\u91cf\u7684\u904b\u7b97\u8cc7\u6e90<\/li>\n<li>\u5927\u90e8\u5206\u7528\u6236\u6709\u8a55\u5206\u7d00\u9304\u7684\u8cc7\u6599\u90fd\u53ea\u4f54\u6240\u6709\u8cc7\u6599\u4e2d\u7684\u5f88\u5c0f\u4e00\u90e8\u5206\uff0cmatrix \u76f8\u7576\u7a00\u758f\uff0c\u5f88\u96e3\u627e\u5230\u76f8\u4f3c\u7684\u8cc7\u6599<\/li>\n<li>\u6703\u6709\u99ac\u592a\u6548\u61c9\uff0c\u8d8a\u71b1\u9580\u7684\u7269\u54c1\u8d8a\u5bb9\u6613\u88ab\u63a8\u85a6\uff0c\u6240\u4ee5\u901a\u5e38\u90fd\u6703\u964d\u4f4e\u71b1\u9580\u7269\u54c1\u7684\u6b0a\u91cd<\/li>\n<\/ul>\n<p>CF \u4e3b\u8981\u5206\u70ba memory-based \u548c model-based \u5169\u5927\u985e<br \/>\nuser-based \u548c item-based collaborative filtering \u5c6c\u65bc memory-based<br \/>\nmemory-based \u57fa\u672c\u4e0a\u5c31\u662f\u7d14\u7cb9\u7684\u8a08\u7b97\uff0c\u6c92\u6709\u4ec0\u9ebc Machine Learning \u7684\u6210\u5206<br \/>\nmodel-based \u624d\u662f Machine Learning \u7684\u7bc4\u7587<\/p>\n<h2>User-based Collaborative Filtering<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-txt\">        item_a  item_b  item_c\nuser_1  2       -       3\nuser_2  5       2       -\nuser_3  3       3       1\nuser_4  -       2       2<\/code><\/pre>\n<pre class=\"line-numbers\"><code class=\"language-py\"># the algorithm from \"Mahout in Action\"\nfor every other user w\n  compute a similarity s between u and w\n  retain the top users, ranked by similarity, as a neighborhood n\n\nfor every item i that some user in n has a preference for,\n      but that u  has no preference for yet\n  for every other user v in n that has a preference for i\n    compute a similarity s between u and  v\n    incorporate v's preference for i, weighted by s, into a running average<\/code><\/pre>\n<p>user-based \u8003\u616e\u7684\u662f user \u548c user \u4e4b\u9593\u7684\u76f8\u4f3c\u7a0b\u5ea6<\/p>\n<p>\u7d66\u5b9a\u4e00\u500b\u7528\u6236 A<br \/>\n\u8a08\u7b97\u7528\u6236 A \u8ddf\u5176\u4ed6\u6240\u6709\u7528\u6236\u7684\u76f8\u4f3c\u5ea6<br \/>\n\u627e\u51fa\u6700\u76f8\u4f3c\u7684 m \u500b\u7528\u6236<br \/>\n\u518d\u627e\u51fa\u9019\u4e9b\u7528\u6236\u6709\u8a55\u5206\u4f46\u662f\u7528\u6236 A \u6c92\u6709\u8a55\u5206\u7684\u7269\u54c1\uff08\u4e5f\u53ef\u4ee5\u984d\u5916\u9650\u5236\u81f3\u5c11\u8981\u5e7e\u500b\u7528\u6236\u6709\u8a55\u5206\u904e\uff09<br \/>\n\u4ee5\u300c\u76f8\u4f3c\u7528\u6236\u7684\u76f8\u4f3c\u5ea6\u300d\u548c\u300c\u8a72\u7528\u6236\u5c0d\u8a72\u7269\u54c1\u7684\u8a55\u5206\u300d\u4f86\u52a0\u6b0a\u7b97\u51fa\u7528\u6236 A \u5c0d\u9019\u4e9b\u672a\u8a55\u5206\u7269\u54c1\u7684\u8a55\u5206<br \/>\n\u6700\u5f8c\u63a8\u85a6\u7d66 A \u8a55\u5206\u6700\u9ad8\u7684 n \u500b\u7269\u54c1<\/p>\n<p>\u9810\u6e2c user_4 \u5c0d item_a \u7684\u8a55\u5206 =<br \/>\n(user_4_user_1_sim x user_1_item_a_rating + user_4_user_3_sim x user_3_item_a_rating) \/ (user_4_user_1_sim + user_4_user_3_sim)<\/p>\n<p>user-based \u7684\u7279\u9ede\uff1a<\/p>\n<ul>\n<li>\u9069\u5408 user \u9060\u5c11\u65bc item \u7684\u7cfb\u7d71\uff0c\u76f8\u4f3c\u5ea6\u7684\u8a08\u7b97\u91cf\u6703\u8f03\u5c11<\/li>\n<li>item \u7684\u6642\u6548\u6027\u5f37\u3001\u66f4\u591a\u6a23\u7684\u7cfb\u7d71\uff0c\u4f8b\u5982\u65b0\u805e\u3001\u793e\u4ea4\u7db2\u7ad9\uff0c\u9069\u5408\u7528 user-based CF<\/li>\n<li>\u4e0d\u5bb9\u6613\u7d66\u51fa\u63a8\u85a6\u7406\u7531<\/li>\n<li>\u9a5a\u559c\u5ea6\u8f03\u9ad8<\/li>\n<\/ul>\n<p>\u5e38\u7528\u7684\u76f8\u4f3c\u5ea6\u6f14\u7b97\u6cd5\uff1a<\/p>\n<ul>\n<li>Pearson Correlation Coefficient<\/li>\n<li>Cosine Similarity<\/li>\n<li>Adjusted Cosine Similarity\uff08\u6709\u4e9b\u7528\u6236\u50be\u5411\u65bc\u5c0d\u6240\u6709\u7269\u54c1\u8a55\u9ad8\u5206\u6216\u4f4e\u5206\uff0c\u9019\u500b\u8a08\u7b97\u65b9\u5f0f\u53ef\u4ee5\u6d88\u9664\u9019\u6a23\u7684\u5f71\u97ff\uff09<\/li>\n<\/ul>\n<p>ref:<br \/>\n<a href=\"https:\/\/www.safaribooksonline.com\/library\/view\/mahout-in-action\/9781935182689\/kindle_split_013.html\">https:\/\/www.safaribooksonline.com\/library\/view\/mahout-in-action\/9781935182689\/kindle_split_013.html<\/a><\/p>\n<h2>Item-based Collaborative Filtering<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-txt\">        user_1  user_2  user_3  user_4\nitem_a  2       5       3       -\nitem_b  -       2       3       2\nitem_c  3       -       1       2<\/code><\/pre>\n<pre class=\"line-numbers\"><code class=\"language-py\"># the algorithm from \"Mahout in Action\"\nfor every item i that u has no preference for yet\n  for every item j that u has a preference for\n    compute a similarity s between i and j\n    add u's preference for j, weighted by s, to a running average\nreturn the top items, ranked by weighted average<\/code><\/pre>\n<p>item-based \u8003\u616e\u7684\u662f item \u548c item \u4e4b\u9593\u7684\u76f8\u4f3c\u7a0b\u5ea6<br \/>\nitem-based \u7528\u7684\u9084\u662f\u8ddf user-based CF \u4e00\u6a21\u4e00\u6a23\u7684\u8cc7\u6599<br \/>\n\u800c\u4e0d\u662f\u4f7f\u7528 item \u672c\u8eab\u7684\u7279\u5fb5\uff08\u90a3\u500b\u53eb content-based\uff09<\/p>\n<p>\u5982\u679c\u7269\u54c1\u6578\u6bd4\u7528\u6236\u6578\u9084\u5c11\u5f97\u591a\u7684\u8a71<br \/>\n\u53ef\u4ee5\u4e8b\u5148\u8a08\u7b97\u597d\u6240\u6709\u7269\u54c1\u4e4b\u9593\u7684\u76f8\u4f3c\u5ea6<br \/>\n\u7d66\u5b9a\u4e00\u500b\u7528\u6236 A<br \/>\n\u627e\u51fa\u7528\u6236 A \u7684\u6240\u6709\u672a\u8a55\u5206\u7269\u54c1<br \/>\n\u4ee5\u300c\u7528\u6236 A \u7684\u5df2\u8a55\u5206\u7269\u54c1\u5c0d\u8a72\u672a\u8a55\u5206\u7269\u54c1\u7684\u76f8\u4f3c\u5ea6\u300d\u548c\u300c\u7528\u6236 A \u5c0d\u5df2\u8a55\u5206\u7269\u54c1\u7684\u8a55\u5206\u300d\u4f86\u52a0\u6b0a\u7b97\u51fa\u7528\u6236 A \u5c0d\u9019\u4e9b\u672a\u8a55\u5206\u7269\u54c1\u7684\u8a55\u5206<br \/>\n\u6700\u5f8c\u63a8\u85a6\u7d66\u7528\u6236 A \u8a55\u5206\u6700\u9ad8\u7684 n \u500b\u7269\u54c1<\/p>\n<p>\u9810\u6e2c user_4 \u5c0d item_a \u7684\u8a55\u5206 =<br \/>\n(item_b_item_a_sim x user_4_item_b_rating + item_c_item_a_sim x user_4_item_c_rating) \/ (item_b_item_a_sim + item_c_item_a_sim)<\/p>\n<p>\u4e5f\u53ef\u4ee5\u7121\u8996\u7528\u6236 A \u7684\u6b77\u53f2\u8a55\u5206\u8cc7\u6599\uff08\u6216\u662f\u6839\u672c\u6c92\u6709\u7528\u6236 A \u7684\u6b77\u53f2\u8cc7\u6599\uff09<br \/>\n\u76f4\u63a5\u63a8\u85a6\u8ddf\u67d0\u500b\u7269\u54c1\u6700\u76f8\u4f3c\u7684 n \u500b\u7269\u54c1<\/p>\n<p>item-based \u7684\u7279\u9ede\uff1a<\/p>\n<ul>\n<li>\u9069\u5408 item \u9060\u5c11\u65bc user \u7684\u7cfb\u7d71\uff0c\u76f8\u4f3c\u5ea6\u7684\u8a08\u7b97\u91cf\u6703\u8f03\u5c11<\/li>\n<li>\u8cfc\u7269\u3001\u96fb\u5f71\u3001\u97f3\u6a02\u3001\u66f8\u7c4d\u7b49\u7cfb\u7d71\uff0c\u7528\u6236\u7684\u8208\u8da3\u76f8\u5c0d\u56fa\u5b9a\uff0c\u9069\u5408\u7528 item-based CF<\/li>\n<li>\u53ea\u6703\u63a8\u85a6\u985e\u4f3c\u7684\u6771\u897f\uff0c\u9a5a\u559c\u5ea6\u548c\u591a\u6a23\u6027\u8f03\u4f4e<\/li>\n<li>\u901a\u5e38\u53ea\u6709\u5728\u7528\u6236\u91cf\u6bd4\u8f03\u5c0f\u7684\u6642\u5019\u624d\u9700\u8981\u983b\u7e41\u5730\u91cd\u65b0\u8a08\u7b97\u7269\u54c1\u4e4b\u9593\u7684\u76f8\u4f3c\u5ea6\uff0c\u96a8\u8457\u7528\u6236\u91cf\u8d8a\u5927\uff0c\u7269\u54c1\u7684\u76f8\u4f3c\u5ea6\u6703\u8da8\u65bc\u7a69\u5b9a<\/li>\n<\/ul>\n<p>ref:<br \/>\n<a href=\"https:\/\/ashokharnal.wordpress.com\/2014\/12\/18\/worked-out-example-item-based-collaborative-filtering-for-recommenmder-engine\/\">https:\/\/ashokharnal.wordpress.com\/2014\/12\/18\/worked-out-example-item-based-collaborative-filtering-for-recommenmder-engine\/<\/a><br \/>\n<a href=\"http:\/\/blog.csdn.net\/huagong_adu\/article\/details\/7362908\">http:\/\/blog.csdn.net\/huagong_adu\/article\/details\/7362908<\/a><\/p>\n<h2>Slope One Recommender<\/h2>\n<pre class=\"line-numbers\"><code class=\"language-txt\">        item_a  item_b  item_c\nuser_1  5       3       2\nuser_2  3       4       -\nuser_3  -       2       5<\/code><\/pre>\n<pre class=\"line-numbers\"><code class=\"language-py\"># the algorithm from \"Mahout in Action\"\nfor every item i the user u expresses no preference for\n  for every item j that user u expresses a preference for\n    find the average preference difference between j and i\n    add this diff to u's preference value for j\n    add this to a running average\nreturn the top items, ranked by these averages<\/code><\/pre>\n<p>\u56e0\u70ba memory-based collaborative filtering \u7684\u5176\u4e2d\u4e00\u500b\u554f\u984c\u662f\u6578\u64da\u91cf\u5f88\u5927\u6642\u8a08\u7b97\u91cf\u4e5f\u6703\u5f88\u53ef\u89c0<br \/>\n\u6240\u6709\u5c31\u6709\u4eba\u63d0\u51fa Slope One \u9019\u7a2e\u7c21\u55ae\u7c97\u66b4\u7684\u6f14\u7b97\u6cd5\u4f86<br \/>\n\u96d6\u7136 Slope One \u9084\u662f\u5f97\u8a08\u7b97\u6240\u6709\u7269\u54c1\u5169\u5169\u4e4b\u9593\u7684\u5e73\u5747\u5dee\u7570<\/p>\n<p>Slope One \u5047\u8a2d\u4efb\u5169\u500b\u7269\u54c1\u4e4b\u9593\u7684\u8a55\u5206\u90fd\u662f\u4e00\u500b y = mx + b \u800c\u4e14 m = 1\uff08\u659c\u7387\u70ba 1\uff09\u7684\u7dda\u6027\u95dc\u4fc2<br \/>\nitem_a \u5e73\u5747\u6bd4 item_b \u591a (2 + (-1)) \/ 2 = 0.5<br \/>\nitem_a \u5e73\u5747\u6bd4 item_c \u591a (5 - 2) \/ 1 = 3<br \/>\n\u5982\u679c\u7528 user_3 \u5c0d item_b \u7684\u8a55\u5206\u4f86\u9810\u6e2c\u4ed6\u5c0d item_a \u7684\u8a55\u5206\u6703\u662f 2 + 0.5 = 2.5<br \/>\n\u5982\u679c\u7528 user_3 \u5c0d item_c \u7684\u8a55\u5206\u4f86\u9810\u6e2c\u4ed6\u5c0d item_a \u7684\u8a55\u5206\u6703\u662f 5 + 3 = 8<br \/>\n\u901a\u5e38\u6703\u7528\u6709\u591a\u5c11\u4eba\u540c\u6642\u8a55\u5206\u4f86\u52a0\u6b0a\u591a\u500b\u8a55\u5206<\/p>\n<p>\u9810\u6e2c user_3 \u5c0d item_a \u7684\u8a55\u5206 =<br \/>\n((\u540c\u6642\u5c0d item_a \u548c item_b \u8a55\u5206\u7684\u4eba\u6578 x user_3 \u7528 item_b \u5c0d item_a \u7684\u9810\u6e2c\u8a55\u5206) + (\u540c\u6642\u5c0d item_a \u548c item_c \u8a55\u5206\u7684\u4eba\u6578 x user_3 \u7528 item_c \u5c0d item_a \u7684\u9810\u6e2c\u8a55\u5206)) \/ (\u540c\u6642\u5c0d item_a \u548c item_b \u8a55\u5206\u7684\u4eba\u6578 + \u540c\u6642\u5c0d item_a \u548c item_c \u8a55\u5206\u7684\u4eba\u6578)<br \/>\n((2 x 2.5) + (1 x 8)) \/ (2 + 1) = 4.33<\/p>\n<p>ref:<br \/>\n<a href=\"https:\/\/en.wikipedia.org\/wiki\/Slope_One\">https:\/\/en.wikipedia.org\/wiki\/Slope_One<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>dataset \u6703\u662f m \u500b\u7528\u6236\u5c0d n \u500b\u7269\u54c1\u7684\u8a55\u5206 utility matrix\uff0c\u56e0\u70ba\u901a\u5e38\u53ea\u6709\u90e8\u5206\u7528\u6236\u548c\u90e8\u4efd\u7269\u54c1\u6703\u6709\u8a55\u5206\u8cc7\u6599\uff0c\u6240\u4ee5\u662f\u4e00\u500b sparse matrix\uff08\u7a00\u758f\u77e9\u9663\uff09\u3002\u76ee\u6a19\u662f\u5229\u7528\u9019\u4e9b\u7a00\u758f\u7684\u8cc7\u6599\u53bb\u9810\u6e2c\u51fa\u7528\u6236\u5c0d\u4ed6\u9084\u6c92\u8a55\u5206\u904e\u7684\u7269\u54c1\u7684\u8a55\u5206\u3002<\/p>\n","protected":false},"author":1,"featured_media":343,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[97],"tags":[98,104],"class_list":["post-342","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-about-ai","tag-machine-learning","tag-recommender-system"],"_links":{"self":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts\/342","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/comments?post=342"}],"version-history":[{"count":0,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts\/342\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/media\/343"}],"wp:attachment":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/media?parent=342"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/categories?post=342"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/tags?post=342"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}