{"id":347,"date":"2017-02-01T03:41:40","date_gmt":"2017-01-31T19:41:40","guid":{"rendered":"http:\/\/vinta.ws\/code\/?p=347"},"modified":"2026-03-17T00:17:30","modified_gmt":"2026-03-16T16:17:30","slug":"calculate-the-similarity-of-two-vectors","status":"publish","type":"post","link":"https:\/\/vinta.ws\/code\/calculate-the-similarity-of-two-vectors.html","title":{"rendered":"Calculate the similarity of two vectors"},"content":{"rendered":"<p><code>scipy.spatial.distance<\/code><br \/>\n<a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/spatial.distance.html\">https:\/\/docs.scipy.org\/doc\/scipy\/reference\/spatial.distance.html<\/a><\/p>\n<p><code>sklearn.metrics<\/code><br \/>\n<a href=\"http:\/\/scikit-learn.org\/stable\/modules\/classes.html#module-sklearn.metrics\">http:\/\/scikit-learn.org\/stable\/modules\/classes.html#module-sklearn.metrics<\/a><\/p>\n<h2>Distance<\/h2>\n<h3>Euclidean distance \u6b50\u5e7e\u91cc\u5fb7\u8ddd\u96e2<\/h3>\n<pre class=\"line-numbers\"><code class=\"language-py\">from sklearn.metrics.pairwise import euclidean_distances\n\neuclidean_distances([0, 0, 0, 0], [0, 0, 0, 0])\n# array([[ 0.]])\n\neuclidean_distances([1, 0, 1, 0], [1, 0, 1, 0])\n# array([[ 0.]])\n\neuclidean_distances([0, 1, 0, 1], [1, 0, 1, 0])\n# array([[ 2.]])<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.pairwise.euclidean_distances.html\">http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.pairwise.euclidean_distances.html<\/a><\/p>\n<h3>Manhattan Distance \u66fc\u54c8\u9813\u8ddd\u96e2<\/h3>\n<pre class=\"line-numbers\"><code class=\"language-py\">from sklearn.metrics.pairwise import manhattan_distances\n\nmanhattan_distances([0, 0, 0, 0], [0, 0 , 0, 0])\n# array([[ 0.]])\n\nmanhattan_distances([1, 1, 1, 0], [1, 0, 0, 0])\n# array([[ 2.]])\n\nmanhattan_distances([0, 1, 0, 1], [1, 0, 1, 0])\n# array([[ 4.]])<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.pairwise.manhattan_distances.html\">http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.pairwise.manhattan_distances.html<\/a><\/p>\n<h2>Similarity<\/h2>\n<h3>Cosine similarity \u9918\u5f26\u76f8\u4f3c\u5ea6<\/h3>\n<pre class=\"line-numbers\"><code class=\"language-py\">from sklearn.metrics.pairwise import cosine_similarity\nfrom sklearn.metrics.pairwise import cosine_distances\nfrom sklearn.metrics.pairwise import pairwise_distances\nfrom scipy.spatial.distance import pdist, squareform\n\ncosine_similarity(matrix) == \n1 - cosine_distances(matrix) == \n1 - pairwise_distances(matrix, metric='cosine') == \n1 - squareform(pdist(matrix, 'cosine'))\n\ncosine_similarity([0, 0, 0, 0], [0, 0, 0, 0])\n# array([[ 0.]])\n\ncosine_similarity([1, 0, 0, 0], [1, 0, 0, 0])\n# array([[ 1.]])\n\ncosine_similarity([1, 0, 1, 0], [0, 1, 0, 1])\n# array([[ 0.]])\n\ncosine_similarity([1, 0, 0, 1], [1, 0, 0, 0])\n# array([[ 0.70710678]])\n\ncosine_similarity([1, 0, 0, 1], [1, 0, 1, 0])\n# array([[ 0.5]])<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.pairwise.cosine_similarity.html\">http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.pairwise.cosine_similarity.html<\/a><br \/>\n<a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.pairwise.cosine_distances.html\">http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.pairwise.cosine_distances.html<\/a><br \/>\n<a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.pairwise.pairwise_distances.html\">http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.pairwise.pairwise_distances.html<\/a><br \/>\n<a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.spatial.distance.pdist.html\">https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.spatial.distance.pdist.html<\/a><\/p>\n<h3>Jaccard similarity coefficient score<\/h3>\n<pre class=\"line-numbers\"><code class=\"language-py\">from sklearn.metrics import jaccard_similarity_score\n\njaccard_similarity_score([0, 0, 0, 0], [0, 0, 0, 0])\n# 1.0\n\njaccard_similarity_score([0, 0, 0, 0], [1, 0, 0, 0])\n# 0.75\n\njaccard_similarity_score([1, 0, 0, 0], [1, 0, 0, 0])\n# 1.0\n\njaccard_similarity_score([1, 0, 1, 0], [0, 1, 0, 1])\n# 0.0<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.jaccard_similarity_score.html\">http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.metrics.jaccard_similarity_score.html<\/a><\/p>\n<p><a href=\"http:\/\/datascience.stackexchange.com\/questions\/5121\/applications-and-differences-for-jaccard-similarity-and-cosine-similarity\">http:\/\/datascience.stackexchange.com\/questions\/5121\/applications-and-differences-for-jaccard-similarity-and-cosine-similarity<\/a><\/p>\n<h3>Log-Likelihood similarity<\/h3>\n<p>TODO<\/p>\n<h3>Pearson correlation coefficient \u76ae\u723e\u68ee\u76f8\u95dc\u4fc2\u6578<\/h3>\n<p>It has a value between +1 and \u22121 inclusive, where 1 is total positive linear correlation, 0 is no linear correlation, and \u22121 is total negative linear correlation. You should only calculate Pearson Correlations when the number of items in common between two users is &gt; 1, preferably greater than 5\/10. Only calculate the Pearson Correlation for two users where they have commonly rated items.<\/p>\n<p>For high-dimensional binary attributes, the performances of Pearson correlation coefficient and Cosine similarity<br \/>\nare better than Jaccard similarity coefficient score.<\/p>\n<pre class=\"line-numbers\"><code class=\"language-py\">from scipy.stats import pearsonr\n\npearsonr([1, 0, 1, 1], [0, 0, 0, 0])\n# (nan, 1.0)\n\npearsonr([1, 0, 1, 1], [1, 0, 0, 0])\n# (0.33333333333333331, 0.66666666666666607)\n\npearsonr([1, 0, 1, 0], [0, 1, 0, 1])\n# (-1.0, 0.0)<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.stats.pearsonr.html\">https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.stats.pearsonr.html<\/a><br \/>\n<a href=\"http:\/\/stackoverflow.com\/questions\/11429604\/how-is-nan-handled-in-pearson-correlation-user-user-similarity-matrix-in-a-recom\">http:\/\/stackoverflow.com\/questions\/11429604\/how-is-nan-handled-in-pearson-correlation-user-user-similarity-matrix-in-a-recom<\/a><\/p>\n<h2>Dissimilarity<\/h2>\n<h3>Dice dissimilarity<\/h3>\n<pre class=\"line-numbers\"><code class=\"language-py\">from scipy.spatial.distance import dice\nimport numpy as np\n\nv1 = np.array([0, 0, 0, 0])\nv2 = np.array([0, 0, 0, 0])\n\ntry:\n    sim = 1.0 - dice(v1.astype(bool), v2.astype(bool))\nexcept ZeroDivisionError:\n    sim = 0<\/code><\/pre>\n<p>ref:<br \/>\n<a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.spatial.distance.dice.html\">https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.spatial.distance.dice.html<\/a><br \/>\n<a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.spatial.distance.kulsinski.html\">https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.spatial.distance.kulsinski.html<\/a><br \/>\n<a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.spatial.distance.sokalsneath.html\">https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.spatial.distance.sokalsneath.html<\/a><br \/>\n<a href=\"https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.spatial.distance.yule.html\">https:\/\/docs.scipy.org\/doc\/scipy\/reference\/generated\/scipy.spatial.distance.yule.html<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>You should only calculate Pearson Correlations when the number of items in common between two users is > 1, preferably greater than 5\/10. Only calculate the Pearson Correlation for two users where they have commonly rated items.<\/p>\n","protected":false},"author":1,"featured_media":348,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[97,4],"tags":[98,2],"class_list":["post-347","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-about-ai","category-about-python","tag-machine-learning","tag-python"],"_links":{"self":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts\/347","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/comments?post=347"}],"version-history":[{"count":0,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/posts\/347\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/media\/348"}],"wp:attachment":[{"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/media?parent=347"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/categories?post=347"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vinta.ws\/code\/wp-json\/wp\/v2\/tags?post=347"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}