List that a large one will take up a huge amount of space, since each item in the list will occupy at least 4 bytes, so only one list with the most valid items will take at least 2 GB of RAM 1 . And this is even without taking into account 64-bit systems 2 .
4 * 5.4e+8 = 2.1e+9 , 2.1 GB8 * 1.2e+18 = 9.2+18 , 9.2 EB (yes, exabytes)
However, for the sake of the question, let's say that you have a lot of RAM.
One of the easiest options is to create your own object for storing and processing a massive list. This would basically divide the massive list into smaller lists, and then, if necessary, would access them accordingly. Because the there is no real advantage to the list subclass , because you still have to rewrite all the methods, you'd better just subclass the object and then from there. The only thing important here is to never combine or copy sub-lists (because they are massive), so use itertools.chain to itertools.chain over lists if necessary.
Here is an example of the simplest list methods append , extend , get/setitem working:
import sys from itertools import chain class largeList(object): def __init__(self, mylist=[]): self.maxSize = sys.maxsize/4 self.src = [[]] self.extend(mylist) def __iter__(self): return chain(*self.src) def __getitem__(self, idx): return self.src[int(idx/self.maxSize)][idx%self.maxSize] def __setitem__(self, idx, item): self.src[int(idx/self.maxSize)][idx%self.maxSize] = item # expand set/getitem to support negative indexing. def append(self, item): if len(self.src[-1]) < self.maxSize: self.src[-1].append(item) else: self.src.append([item]) def extend(self, items): remainder = self.maxSize - len(self.src[-1]) self.src[-1].extend(items[:remainder]) for i in xrange(0, len(items[remainder:]), self.maxSize): self.src.append(items[remainder:][i:i+self.maxSize]) def __len__(self): return sum(len(l) for l in self.src) def __str__(self): size = self.__len__() if size >= 8: first, last = [], [] for i, ele in enumerate(self.__iter__()): if i < 3: first.append(ele) if i >= size - 3: last.append(ele) return str(first)[:-1] + ', ..., ' + str(last)[1:] return str(list(self.__iter__()))
Usage example ( if you have less than 4 GB of free memory or you are on a 64-bit system, change sys.maxsize before trying to do this! ):
#sys.maxsize = 1000 list1 = largeList(xrange(sys.maxsize/4 + 2)) print len(list1)
Result: internally, the lists are divided into several lists, and when working with them, they seem to be only one. More list functionality can be easily added by adding additional methods. For instance. pop , remove , insert and index :
(...) def pop(self, idx): listidx = int(idx/self.maxSize) itempopped = self.src[listidx].pop(idx%self.maxSize) for i in xrange(listidx, len(self.src)-1): self.src[i].append(self.src[i+1].pop(0)) if not self.src[-1]: del self.src[-1] return itempopped def remove(self, item): for i, ele in enumerate(self.__iter__()): if ele == item: self.pop(i) break def insert(self, idx, item): listidx = int(idx/self.maxSize) itemtoshift = self.src[listidx].pop(-1) self.src[listidx].insert(idx%self.maxSize, item) for i in xrange(listidx+1, len(self.src)-1): itemremoved = self.src[i].pop(-1) self.src[i].insert(0, itemtoshift) itemtoshift = itemremoved if len(self.src[-1]) < self.maxSize: self.src[-1].insert(0, itemtoshift) else: self.src.append([self.src[-1].pop(-1)]) self.src[-2].insert(0, itemtoshift) def index(self, item): for i, ele in enumerate(self.__iter__()): if ele == item: return i return -1
Usage example, continued:
#sys.maxsize = 1000 list1 = largeList(xrange(sys.maxsize/4 + 2)) list1.insert(0, 'a') print list1