I have a large csv file and I open it with pd.read_csv, as follows:
df = pd.read_csv(path//fileName.csv, sep = ' ', header = None)
Since the file is really large, I would like to be able to open it in lines
from 0 to 511 from 512 to 1023 from 1024 to 1535 ... from 512*n to 512*(n+1) - 1
Where n = 1, 2, 3 ...
If I add chunksize = 512 to read_csv arguments
df = pd.read_csv(path//fileName.csv, sep = ' ', header = None, chunksize = 512)
and I print
df.get_chunk(5)
How can I open lines from 0 to 5, or can I split the file in parts of 512 lines using the for loop
data = [] for chunks in df: data = data + [chunk]
But this is completely useless, since the file must be fully open and takes time. How can I read only lines from 512 * n to 512 * (n + 1).
Looking around, I often saw that “chunksize” is used along with the “iterator”, as it should
df = pd.read_csv(path//fileName.csv, sep = ' ', header = None, iterator = True, chunksize = 512)
But after many attempts, I still do not understand what advantages this logical variable provides me. Could you please explain this to me?