Does python list (set (a)) change its order each time?

I have a list of 5 million string elements that are stored as a pickle object.

a = ['https://en.wikipedia.org/wiki/Data_structure','https://en.wikipedia.org/wiki/Data_mining','https://en.wikipedia.org/wiki/Statistical_learning_theory','https://en.wikipedia.org/wiki/Machine_learning','https://en.wikipedia.org/wiki/Computer_science','https://en.wikipedia.org/wiki/Information_theory','https://en.wikipedia.org/wiki/Statistics','https://en.wikipedia.org/wiki/Mathematics','https://en.wikipedia.org/wiki/Signal_processing','https://en.wikipedia.org/wiki/Sorting_algorithm','https://en.wikipedia.org/wiki/Data_structure','https://en.wikipedia.org/wiki/Quicksort','https://en.wikipedia.org/wiki/Merge_sort','https://en.wikipedia.org/wiki/Heapsort','https://en.wikipedia.org/wiki/Insertion_sort','https://en.wikipedia.org/wiki/Introsort','https://en.wikipedia.org/wiki/Selection_sort','https://en.wikipedia.org/wiki/Timsort','https://en.wikipedia.org/wiki/Cubesort','https://en.wikipedia.org/wiki/Shellsort'] 

To remove duplicates, I use set(a) , then I made the list again using list(set(a)) .

My question is:

Even if I restart python and read the list from the pickle file, will the list(set(a)) order be the same every time?

I really want to know how this hash works β†’ order list.


I tested with a small data set and seemed to have a sequential order.

 In [50]: a = ['x','y','z','k'] In [51]: a ['x', 'y', 'z', 'k'] In [52]: list(set(a)) ['y', 'x', 'k', 'z'] In [53]: b=list(set(a)) In [54]: list(set(b)) ['y', 'x', 'k', 'z'] In [55]: del b In [56]: b=list(set(a)) In [57]: b ['y', 'x', 'k', 'z'] 
+6
source share
1 answer

I would suggest an auxiliary set() to ensure unity when adding items to the list, thereby preserving the order of your list() and not preserving set() as such.

First upload your list and create a set with the contents Before adding items to your list, make sure they are not in the set (much faster search using "in" from the set, not the list, especially if there are many elements) Sort your list, order will be exactly the one you want

Disadvantage: takes up twice as much memory as processing only set()

+2
source

All Articles