scipy.spatial.distance
https://docs.scipy.org/doc/scipy/reference/spatial.distance.html
sklearn.metrics
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
Distance
Euclidean distance 歐幾里德距離
from sklearn.metrics.pairwise import euclidean_distances
euclidean_distances([0, 0, 0, 0], [0, 0, 0, 0])
# array([[ 0.]])
euclidean_distances([1, 0, 1, 0], [1, 0, 1, 0])
# array([[ 0.]])
euclidean_distances([0, 1, 0, 1], [1, 0, 1, 0])
# array([[ 2.]])
ref:
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html
Manhattan Distance 曼哈頓距離
from sklearn.metrics.pairwise import manhattan_distances
manhattan_distances([0, 0, 0, 0], [0, 0 , 0, 0])
# array([[ 0.]])
manhattan_distances([1, 1, 1, 0], [1, 0, 0, 0])
# array([[ 2.]])
manhattan_distances([0, 1, 0, 1], [1, 0, 1, 0])
# array([[ 4.]])
ref:
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.manhattan_distances.html
Similarity
Cosine similarity 餘弦相似度
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import cosine_distances
from sklearn.metrics.pairwise import pairwise_distances
from scipy.spatial.distance import pdist, squareform
cosine_similarity(matrix) == \
1 - cosine_distances(matrix) == \
1 - pairwise_distances(matrix, metric='cosine') == \
1 - squareform(pdist(matrix, 'cosine'))
cosine_similarity([0, 0, 0, 0], [0, 0, 0, 0])
# array([[ 0.]])
cosine_similarity([1, 0, 0, 0], [1, 0, 0, 0])
# array([[ 1.]])
cosine_similarity([1, 0, 1, 0], [0, 1, 0, 1])
# array([[ 0.]])
cosine_similarity([1, 0, 0, 1], [1, 0, 0, 0])
# array([[ 0.70710678]])
cosine_similarity([1, 0, 0, 1], [1, 0, 1, 0])
# array([[ 0.5]])
ref:
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_distances.html
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html
Jaccard similarity coefficient score
from sklearn.metrics import jaccard_similarity_score
jaccard_similarity_score([0, 0, 0, 0], [0, 0, 0, 0])
# 1.0
jaccard_similarity_score([0, 0, 0, 0], [1, 0, 0, 0])
# 0.75
jaccard_similarity_score([1, 0, 0, 0], [1, 0, 0, 0])
# 1.0
jaccard_similarity_score([1, 0, 1, 0], [0, 1, 0, 1])
# 0.0
ref:
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.jaccard_similarity_score.html
Log-Likelihood similarity
TODO
Pearson correlation coefficient 皮爾森相關係數
It has a value between +1 and −1 inclusive, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. You should only calculate Pearson Correlations when the number of items in common between two users is > 1, preferably greater than 5/10. Only calculate the Pearson Correlation for two users where they have commonly rated items.
For hign-dimensional binary attributes, the performances of Pearson correlation coefficient and Cosine similarity
are better than Jaccard similarity coefficient score.
from scipy.stats import pearsonr
pearsonr([1, 0, 1, 1], [0, 0, 0, 0])
# (nan, 1.0)
pearsonr([1, 0, 1, 1], [1, 0, 0, 0])
# (0.33333333333333331, 0.66666666666666607)
pearsonr([1, 0, 1, 0], [0, 1, 0, 1])
# (-1.0, 0.0)
ref:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html
http://stackoverflow.com/questions/11429604/how-is-nan-handled-in-pearson-correlation-user-user-similarity-matrix-in-a-recom
Dissimilarity
Dice dissimilarity
from scipy.spatial.distance import dice
import numpy as np
v1 = np.array([0, 0, 0, 0])
v2 = np.array([0, 0, 0, 0])
try:
sim = 1.0 - dice(v1.astype(bool), v2.astype(bool))
except ZeroDivisionError:
sim = 0
ref:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.dice.html
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.kulsinski.html
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.sokalsneath.html
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.yule.html