Cosine Similarity

From the Wikipedia article on Cosine similarity:

In data analysis, cosine similarity is a measure of similarity between two non-zero vectors defined in an inner product space. Cosine similarity is the cosine of the angle between the vectors; that is, it is the dot product of the vectors divided by the product of their lengths. It follows that the cosine similarity does not depend on the magnitudes of the vectors, but only on their angle. The cosine similarity always belongs to the interval [-1,1].

For example, two proportional vectors have a cosine similarity of 1, two orthogonal vectors have a similarity of 0, and two opposite vectors have a similarity of -1. In some contexts, the component values of the vectors cannot be negative, in which case the cosine similarity is bounded in [0,1].

So for two vectors A and B each of length n, the cosine similarity \cos (\theta) is defined as:

\cos (\theta ) = \dfrac {A \cdot B} {\left\| A\right\| \left\| B\right\|} = \dfrac {\sum \limits_{i=1}^{n}{A_i B_i}} {\sqrt{\sum \limits_{i=1}^{n}{(A_i)^2}} \sqrt{\sum \limits_{i=1}^{n}{(B_i)^2}}}

For example if A=(1,2,3) and B=(4,5,6) then the cosine similarity is:

\cos (\theta ) = \dfrac {1 \cdot 4 + 2 \cdot 5 + 3 \cdot 6} {\sqrt{1^2 + 2^2 + 3^2} \sqrt{4^2 + 5^2 + 6^2}} = \dfrac {32} {\sqrt{14} \sqrt{77}} \approx 0.9746

Write a function called cosine_similarity that takes two vectors as input and returns the cosine similarity between them.

Bonus challenge

Implement the following different versions of your function, and use %timeit (e.g. in an iPython terminal) to compare their performance on vectors of length 10,000:

  1. a naive Python version that uses for loops to compute the dot product and the lengths of the vectors, which are implemented as simple lists. You can get sqrt using from math import sqrt.
  2. a version that uses numpy arrays and the numpy.dot function to compute the dot product and the numpy.linalg.norm function to compute the lengths of the vectors