Skip to content

How to calculate Cosine Similarity (With code)

Cosine Similarity is a common calculation method for calculating text similarity. The basic concept is very simple, it is to calculate the angle between two vectors.

The angle larger, the less similar the two vectors are.
The angle smaller, the more similar the two vectors are.

There are three vectors A, B, C. We will say that C and B are more similar.

And then, how do we calculate Cosine similarity? Although the formula is given at the top, it is directly implemented using code.


Code

If we want to calculate the cosine similarity, we need to calculate the dot value of A and B, and the lengths of A, B.

Python Script:

from sklearn.metrics.pairwise import cosine_similarity


# Vectors
vec_a = [1, 2, 3, 4, 5]
vec_b = [1, 3, 5, 7, 9]

# Dot and norm
dot = sum(a*b for a, b in zip(vec_a, vec_b))
norm_a = sum(a*a for a in vec_a) ** 0.5
norm_b = sum(b*b for b in vec_b) ** 0.5

# Cosine similarity
cos_sim = dot / (norm_a*norm_b)

# Results
print('My version:', cos_sim)
print('Scikit-Learn:', cosine_similarity([vec_a], [vec_b]))



Output:

My version: 0.9972413740548081
Scikit-Learn: [[0.99724137]]

The previous part of the code is the implementation of the cosine similarity formula above, and the bottom part is directly calling the function in Scikit-Learn to complete it. As you can see, the scores calculated on both sides are basically the same.


References

Tags:

Leave a Reply