Interestingly, torch.norm slower on the CPU and faster on the GPU than the direct approach.
import torch x = torch.randn(1024,100) y = torch.randn(1024,100) %timeit torch.sqrt((x - y).pow(2).sum(1)) %timeit torch.norm(x - y, 2, 1)
Out:
1000 loops, best of 3: 910 µs per loop 1000 loops, best of 3: 1.76 ms per loop
On the other hand:
import torch x = torch.randn(1024,100).cuda() y = torch.randn(1024,100).cuda() %timeit torch.sqrt((x - y).pow(2).sum(1)) %timeit torch.norm(x - y, 2, 1)
Out:
10000 loops, best of 3: 50 µs per loop 10000 loops, best of 3: 26 µs per loop
prosti
source share