{"id":392,"date":"2017-05-17T15:58:08","date_gmt":"2017-05-17T07:58:08","guid":{"rendered":"http:\/\/vinta.ws\/code\/?p=392"},"modified":"2026-02-18T01:20:35","modified_gmt":"2026-02-17T17:20:35","slug":"generate-negative-samples-for-recommender-system","status":"publish","type":"post","link":"https:\/\/vinta.ws\/code\/generate-negative-samples-for-recommender-system.html","title":{"rendered":"Generate negative samples for recommender system?"},"content":{"rendered":"<p>\u6839\u64da\u300c\u63a8\u8350\u7cfb\u7edf\u5b9e\u8df5\u300d\uff0c\u6311\u9078\u8ca0\u6a23\u672c\u6642\u61c9\u8a72\u9075\u5faa\u4ee5\u4e0b\u539f\u5247\uff1a<\/p>\n<ul>\n<li>\u5bf9\u6bcf\u4e2a\u7528\u6237\uff0c\u8981\u4fdd\u8bc1\u6b63\u8d1f\u6837\u672c\u7684\u5e73\u8861\uff08\u6570\u76ee\u76f8\u4f3c\uff09\u3002<\/li>\n<li>\u5bf9\u6bcf\u4e2a\u7528\u6237\u91c7\u6837\u8d1f\u6837\u672c\u65f6\uff0c\u8981\u9009\u53d6\u90a3\u4e9b\u5f88\u70ed\u95e8\uff0c\u800c\u7528\u6237\u5374\u6ca1\u6709\u884c\u4e3a\u7684\u7269\u54c1\u3002<\/li>\n<li>\u4e00\u822c\u8ba4\u4e3a\uff0c\u5f88\u70ed\u95e8\u800c\u7528\u6237\u5374\u6ca1\u6709\u884c\u4e3a\u66f4\u52a0\u4ee3\u8868\u7528\u6237\u5bf9\u8fd9\u4e2a\u7269\u54c1\u4e0d\u611f\u5174\u8da3\u3002\u56e0\u4e3a\u5bf9\u4e8e\u51b7\u95e8\u7684\u7269\u54c1\uff0c\u7528\u6237\u53ef\u80fd\u662f\u538b\u6839\u6ca1\u5728\u7f51\u7ad9\u4e2d\u53d1\u73b0\u8fd9\u4e2a\u7269\u54c1\uff0c\u6240\u4ee5\u8c08\u4e0d\u4e0a\u662f\u5426\u611f\u5174\u8da3\u3002<\/li>\n<\/ul>\n<p>ref:<br \/>\n<a href=\"http:\/\/www.duokan.com\/reader\/www\/app.html?id=ed873c9e323511e28a9300163e0123ac\">http:\/\/www.duokan.com\/reader\/www\/app.html?id=ed873c9e323511e28a9300163e0123ac<\/a><\/p>\n<p>\u4e0d\u904e\u5982\u679c\u4f60\u662f\u7528 Spark ML \u7684 <code>ALS(implicitPrefs=True)<\/code> \u7684\u8a71\uff0c\u4e26\u4e0d\u9700\u8981\u624b\u52d5\u52a0\u5165\u8ca0\u6a23\u672c\u3002\u5c0d implicit feedback \u7684 ALS \u4f86\u8aaa\uff0c\u624b\u52d5\u52a0\u5165\u8ca0\u6a23\u672c\uff08Rui = 0 \u7684\u6a23\u672c\uff09\u662f\u6c92\u6709\u610f\u7fa9\u7684\uff0c\u56e0\u70ba missing value \/ non-observed value \u5c0d\u8a72\u6f14\u7b97\u6cd5\u4f86\u8aaa\u672c\u4f86\u5c31\u662f 0\uff0c\u8868\u793a\u7528\u6236\u78ba\u5be6\u6c92\u6709\u5c0d\u8a72\u7269\u54c1\u505a\u51fa\u884c\u70ba\uff0c\u4e5f\u5c31\u662f Pui = 0 \u6c92\u6709\u504f\u597d\uff0c\u6240\u4ee5 Cui = 1 + alpha x 0 \u7f6e\u4fe1\u5ea6\u4e5f\u6703\u6bd4\u5176\u4ed6\u6b63\u6a23\u672c\u4f4e\u3002\u4e0d\u904e\u56e0\u70ba Spark ML \u7684 ALS \u53ea\u6703\u8a08\u7b97 Rui &gt; 0 \u7684\u9805\u76ee\uff0c\u6240\u4ee5\u5373\u4fbf\u4f60\u624b\u52d5\u52a0\u5165\u4e86 Rui = 0 \u6216 Rui = -1 \u7684\u8ca0\u6a23\u672c\uff0c\u5c0d\u6574\u500b\u6a21\u578b\u5176\u5be6\u6c92\u6709\u5f71\u97ff\u3002<\/p>\n<p>\u7528\u4ee5\u4e0b\u9019\u4e09\u7d44 dataset \u8a13\u7df4\u51fa\u4f86\u7684\u6a21\u578b\u90fd\u662f\u4e00\u6a23\u7684\uff1a<\/p>\n<pre class=\"line-numbers\"><code class=\"language-py\">from pyspark.ml.recommendation import ALS\n\nmatrix = [\n    (1, 1, 0),\n    (1, 2, 1),\n    (1, 3, 0),\n    (1, 4, 1),\n    (1, 5, 1),\n    (2, 1, 1),\n    (2, 2, 1),\n    (2, 3, 0),\n    (2, 4, 1),\n    (2, 5, 1),\n    (3, 1, 1),\n    (3, 2, 1),\n    (3, 3, 1),\n    (3, 4, 1),\n    (3, 5, 0),\n]\ndf0 = spark.createDataFrame(matrix, ['user', 'item', 'rating'])\n\nmatrix = [\n    (1, 1, -1),\n    (1, 2, 1),\n    (1, 3, -1),\n    (1, 4, 1),\n    (1, 5, 1),\n    (2, 1, 1),\n    (2, 2, 1),\n    (2, 3, -1),\n    (2, 4, 1),\n    (2, 5, 1),\n    (3, 1, 1),\n    (3, 2, 1),\n    (3, 3, 1),\n    (3, 4, 1),\n    (3, 5, -1),\n]\ndf1 = spark.createDataFrame(matrix, ['user', 'item', 'rating'])\n\nmatrix = [\n    (1, 2, 1),\n    (1, 4, 1),\n    (1, 5, 1),\n    (2, 1, 1),\n    (2, 2, 1),\n    (2, 4, 1),\n    (2, 5, 1),\n    (3, 1, 1),\n    (3, 2, 1),\n    (3, 3, 1),\n    (3, 4, 1),\n]\ndf2 = spark.createDataFrame(matrix, ['user', 'item', 'rating'])\n\nals = ALS(implicitPrefs=True, seed=42, nonnegative=False).setRank(7).setMaxIter(15).setRegParam(0.01).setAlpha(40)\nalsModel = als.fit(df0)\nalsModel.userFactors.select('features').show(truncate=False)\nalsModel.itemFactors.select('features').show(truncate=False)\n\nals = ALS(implicitPrefs=True, seed=42, nonnegative=False).setRank(7).setMaxIter(15).setRegParam(0.01).setAlpha(40)\nalsModel = als.fit(df1)\nalsModel.userFactors.select('features').show(truncate=False)\nalsModel.itemFactors.select('features').show(truncate=False)\n\nals = ALS(implicitPrefs=True, seed=42, nonnegative=False).setRank(7).setMaxIter(15).setRegParam(0.01).setAlpha(40)\nalsModel = als.fit(df2)\nalsModel.userFactors.select('features').show(truncate=False)\nalsModel.itemFactors.select('features').show(truncate=False)<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"https:\/\/github.com\/apache\/spark\/blob\/master\/mllib\/src\/main\/scala\/org\/apache\/spark\/ml\/recommendation\/ALS.scala#L1626\">https:\/\/github.com\/apache\/spark\/blob\/master\/mllib\/src\/main\/scala\/org\/apache\/spark\/ml\/recommendation\/ALS.scala#L1626<\/a><br \/>\n<a href=\"https:\/\/github.com\/apache\/spark\/commit\/b05b3fd4bacff1d8b1edf4c710e7965abd2017a7\">https:\/\/github.com\/apache\/spark\/commit\/b05b3fd4bacff1d8b1edf4c710e7965abd2017a7<\/a><br \/>\n<a href=\"https:\/\/www.mail-archive.com\/user@spark.apache.org\/msg60240.html\">https:\/\/www.mail-archive.com\/user@spark.apache.org\/msg60240.html<\/a><br \/>\n<a href=\"http:\/\/apache-spark-user-list.1001560.n3.nabble.com\/implicit-ALS-dataSet-td7067.html\">http:\/\/apache-spark-user-list.1001560.n3.nabble.com\/implicit-ALS-dataSet-td7067.html<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u5c0d implicit feedback \u7684 ALS \u4f86\u8aaa\uff0c\u624b\u52d5\u52a0\u5165\u8ca0\u6a23\u672c\uff08Rui = 0 \u7684\u6a23\u672c\uff09\u662f\u6c92\u6709\u610f\u7fa9\u7684\uff0c\u56e0\u70ba missing value \/ non-observed value \u5c0d\u8a72\u6f14\u7b97\u6cd5\u4f86\u8aaa\u672c\u4f86\u5c31\u662f 0\uff0c\u8868\u793a\u7528\u6236\u78ba\u5be6\u6c92\u6709\u5c0d\u8a72\u7269\u54c1\u505a\u51fa\u884c\u70ba\uff0c\u4e5f\u5c31\u662f Pui = 0 \u6c92\u6709\u504f\u597d\u3002<\/p>\n","protected":false},"author":1,"featured_media":393,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[97],"tags":[108,98,2,104],"class_list":["post-392","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-about-ai","tag-apache-spark","tag-machine-learning","tag-python","tag-recommender-system"],"_links":{"self":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts\/392","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/comments?post=392"}],"version-history":[{"count":0,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts\/392\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/media\/393"}],"wp:attachment":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/media?parent=392"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/categories?post=392"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/tags?post=392"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}