There is a C ++ comparison for combining lists from list lists: The fastest way to find a union of sets
And there are a few more python related questions, but none of them offer the fastest way to merge lists:
- Finding combining list of lists in Python
- Python small list smoothing
From the answers, I realized that there are at least 2 ways to do this:
>>> from itertools import chain >>> x = [[1,2,3], [3,4,5], [1,7,8]] >>> list(set().union(*x)) [1, 2, 3, 4, 5, 7, 8] >>> list(set(chain(*x))) [1, 2, 3, 4, 5, 7, 8]
Please note that I force the list after the list, because I need the list list to be fixed for further processing.
After some comparison, it seems that list(set(chain(*x))) more stable and takes less time:
from itertools import chain import time import random
[exit]:
1.39586925507e-05 1.09834671021e-05
Taking out the variable of casting kits to the list:
y_time = 0 z_time = 0 for _ in range(1000): x = [[random.choice(range(10000)) for i in range(10)] for j in range(10)] start = time.time() y = set().union(*x) y_time += time.time() - start start = time.time() z = set(chain(*x)) z_time += time.time() - start assert sorted(y) == sorted(z) print y_time / 1000. print z_time / 1000.
[exit]:
1.22241973877e-05 1.02684497833e-05
Here's the full conclusion when I try to print intermediate timings (without listing the list): http://pastebin.com/raw/y3i6dXZ8
Why does this list(set(chain(*x))) take less time than list(set().union(*x)) ?
Is there any other way to achieve a unified list combining? Using numpy or pandas or sframe or something else? Is the alternative faster?