I have two non-traditional vectors, and I would like to calculate the Euclidean distance between them. The vectors are configured as follows:
line1 = '2:20 3:20 5:10 6:10 10:20'
line2 = '1:18 2:20 4:10 6:10 8:20 9:10 10:10'
For each element, the first number is the location in the vector, and the second is the value (for example, 2:20 means element 2 in the vector, the value is 20). Thus, the vector for line 1 is (0.20,20,0,10,10,10,0,0,0,0,20), and the vector for line 2 is (18,20,0,10,0,10,10,0 , 20, 10,10).
I wrote the following program that works great. The problem is that I have HUGE vectors, and I want to compare them with thousands of other vectors. My computer starts to give me memory errors when I try to start it like this. Is there a way to calculate the Euclidean distance between two vectors that are configured this way without creating long vectors (with many 0 elements)?
def vec_line(line):
vector = [0]*10
datapoints = line.split(' ')
for d,datapoint in enumerate(datapoints):
element = int(datapoint.split(':')[0])
value = float(datapoint.split(':')[1])
vector[element-1]=value
npvec = np.array(vector)
return npvec
vector1 = vec_line(line1)
vector2 = vec_line(line2)
dist = np.linalg.norm(vector1-vector2)
print dist
--> [39.0384425919]