I have a large csv file, about 600 MB with 11 million lines, and I want to create statistics such as pivots, histograms, charts, etc. Obviously, he is trying to just read it usually:
df = pd.read_csv('Check400_900.csv', sep='\t')
not working, so I found iteration and chunksize in a similar entry, so I used
df = pd.read_csv('Check1_900.csv', sep='\t', iterator=True, chunksize=1000)
Well, I can for example print df.get_chunk(5) and search the entire file only with
for chunk in df: print chunk
My problem: I donโt know how to use things like these below for the whole df, and not just for a single fragment
plt.plot() print df.head() print df.describe() print df.dtypes customer_group3 = df.groupby('UserID') y3 = customer_group.size()
I hope my question is not so confusing
python pandas csv dataframe bigdata
Thodoris p
source share