I have a list of 5 million string elements that are stored as a pickle object.
a = ['https://en.wikipedia.org/wiki/Data_structure','https://en.wikipedia.org/wiki/Data_mining','https://en.wikipedia.org/wiki/Statistical_learning_theory','https://en.wikipedia.org/wiki/Machine_learning','https://en.wikipedia.org/wiki/Computer_science','https://en.wikipedia.org/wiki/Information_theory','https://en.wikipedia.org/wiki/Statistics','https://en.wikipedia.org/wiki/Mathematics','https://en.wikipedia.org/wiki/Signal_processing','https://en.wikipedia.org/wiki/Sorting_algorithm','https://en.wikipedia.org/wiki/Data_structure','https://en.wikipedia.org/wiki/Quicksort','https://en.wikipedia.org/wiki/Merge_sort','https://en.wikipedia.org/wiki/Heapsort','https://en.wikipedia.org/wiki/Insertion_sort','https://en.wikipedia.org/wiki/Introsort','https://en.wikipedia.org/wiki/Selection_sort','https://en.wikipedia.org/wiki/Timsort','https://en.wikipedia.org/wiki/Cubesort','https://en.wikipedia.org/wiki/Shellsort']
To remove duplicates, I use set(a) , then I made the list again using list(set(a)) .
My question is:
Even if I restart python and read the list from the pickle file, will the list(set(a)) order be the same every time?
I really want to know how this hash works β order list.
I tested with a small data set and seemed to have a sequential order.
In [50]: a = ['x','y','z','k'] In [51]: a ['x', 'y', 'z', 'k'] In [52]: list(set(a)) ['y', 'x', 'k', 'z'] In [53]: b=list(set(a)) In [54]: list(set(b)) ['y', 'x', 'k', 'z'] In [55]: del b In [56]: b=list(set(a)) In [57]: b ['y', 'x', 'k', 'z']