Creating a list from a file in Python

The file contains:

1 19 15 36 23 18 39 2 36 23 4 18 26 9 3 35 6 16 11 

From this, I would like to extract the list as follows:

 L = [1,19,15,36,23,18,19,2,36........... ect.] 

What is the most efficient way to do this?

+7
python list file
source share
5 answers

You can use itertools.chain, breaking each line and matching with ints:

 from itertools import chain with open("in.txt") as f: print(list((map(int,chain.from_iterable(line.split() for line in f))))) [1, 19, 15, 36, 23, 18, 39, 2, 36, 23, 4, 18, 26, 9, 3, 35, 6, 16, 11] 

For python2, use itertools.imap instead of a map. using a chain with a map and itertools.chain avoids immediately reading the entire file into memory, which .read will do.

Some timings for python3 in the file are the same as your input * 1000:

 In [5]: %%timeit with open("ints.txt","r") as f: list(map(int,re.split(r"\s+",f.read()))) ...: 100 loops, best of 3: 8.55 ms per loop In [6]: %%timeit with open("ints.txt","r") as f: list((map(int, chain.from_iterable(line.split() for line in f)))) ...: 100 loops, best of 3: 5.76 ms per loop In [7]: %%timeit ...: with open("ints.txt","r") as f: ...: [int(i) for i in f.read().split()] ...: 100 loops, best of 3: 5.82 ms per loop 

So itertools matches the comp list, but uses a lot less memory.

For python2:

 In [3]: %%timeit with open("ints.txt","r") as f: [int(i) for i in f.read().split()] ...: 100 loops, best of 3: 7.79 ms per loop In [4]: %%timeit with open("ints.txt","r") as f: list(imap(int, chain.from_iterable(line.split() for line in f))) ...: 100 loops, best of 3: 8.03 ms per loop In [5]: %%timeit with open("ints.txt","r") as f: list(imap(int,re.split(r"\s+",f.read()))) ...: 100 loops, best of 3: 10.6 ms per loop 

The comp list is a little faster, but uses more memory again if you are going to read everything into memory using the read split imap approach is again the fastest:

 In [6]: %%timeit ...: with open("ints.txt","r") as f: ...: list(imap(int, f.read().split())) ...: 100 loops, best of 3: 6.85 ms per loop 

Same thing for python3 and map:

 In [4]: %%timeit with open("ints.txt","r") as f: list(map(int,f.read().split())) ...: 100 loops, best of 3: 4.41 ms per loop 

So, if speed is all you need, use the list(map(int,f.read().split())) or list(imap(int,f.read().split())) .
If memory is also a concern, combine it with a chain. Another advantage of the chain approach, if memory is a concern, is if you pass int functions or iterate over, you can pass the chain object directly so you don't need to store all the data in memory at all.

The last little optimization is to map str.split in a file object:

 In [5]: %%timeit with open("ints.txt", "r") as f: list((map(int, chain.from_iterable(map(str.split, f))))) ...: 100 loops, best of 3: 5.32 ms per loop 
+5
source share
 with open('yourfile.txt') as f: your_list = f.read().split() 

To apply it to the whole. You can use list compilation:

 your_list = [int(i) for i in f.read().split()] 

This may throw an exception when the value cannot be satisfied.

+3
source share
 f=open("output.txt","r") import re print map(int,re.split(r"\s+",f.read())) f.close() 

You can use re.split , which will return the list and map to int .

+2
source share

If you use the numpy library, another method would be to use np.fromstring() giving the .read() file as input for it, Example -

 import numpy as np with open('file.txt','r') as f: lst = np.fromstring(f.read(),sep=' ',dtype=int) 

At the end of lst will be a numpy array, if you want a python list(lst) use list(lst)

numpy.fromstring() always returns a 1D array, and when you provide space as a delimiter, it will ignore extra spaces that include newlines.


Example / Demo -

 In [39]: import numpy as np In [40]: with open('a.txt','r') as f: ....: lst = np.fromstring(f.read(),sep=' ',dtype=int) ....: In [41]: lst Out[41]: array([ 1, 19, 15, 36, 23, 18, 39, 2, 36, 23, 4, 18, 26, 9, 3, 35, 6, 16, 11]) In [42]: list(lst) Out[42]: [1, 19, 15, 36, 23, 18, 39, 2, 36, 23, 4, 18, 26, 9, 3, 35, 6, 16, 11] 

Performance Testing -

 In [47]: def func1(): ....: with open('a.txt','r') as f: ....: lst = np.fromstring(f.read(),sep=' ',dtype=int) ....: return list(lst) ....: In [37]: def func2(): ....: with open('a.txt','r') as f: ....: return list((map(int,chain.from_iterable(line.split() for line in f)))) ....: In [54]: def func3(): ....: with open('a.txt','r') as f: ....: return np.fromstring(f.read(),sep=' ',dtype=int) ....: In [55]: %timeit func3() 10000 loops, best of 3: 183 ยตs per loop In [56]: %timeit func1() 10000 loops, best of 3: 194 ยตs per loop In [57]: %timeit func2() 10000 loops, best of 3: 212 ยตs per loop 

If you are ok with numpy.ndarray (which is different from the list wrong), it will be faster.

+1
source share

You can use re.findall .

 import re with open(file) as f: print map(int, re.findall(r'\d+', f.read())) 
0
source share

All Articles