Pandas table read error

I am trying to read a tab delimited text file as data.

Here's what the file looks like in Excel:

CALENDAR_DATE ORDER_NUMBER INVOICE_NUMBER TRANSACTION_TYPE CUSTOMER_NUMBER CUSTOMER_NAME 5/13/2016 0:00 13867666 6892372 S 2026 CUSTOMER 1 

Import to df:

 df = p.read_table("E:/FileLoc/ThisIsAFile.txt", encoding = "iso-8859-1") 

Now he does not see the first three columns as part of the column index (df [0] = transaction type), and all the headers are shifted to reflect this.

  CALENDAR_DATE ORDER_NUMBER INVOICE_NUMBER 5/13/2016 0:00 13867666 6892372 S 2026 CUSTOMER 1 

I am trying to manipulate a text file and then import it into the mysql database as the final result.

+5
source share
1 answer

You can use read_csv with a separator of 2 or more spaces:

 import pandas as pd import io temp=u"""CALENDAR_DATE ORDER_NUMBER INVOICE_NUMBER TRANSACTION_TYPE CUSTOMER_NUMBER CUSTOMER_NAME 5/13/2016 0:00 13867666 6892372 S 2026 CUSTOMER 1""" 
 #after testing replace io.StringIO(temp) to filename df =pd.read_csv(io.StringIO(temp), sep=r'\s{2,}', engine='python', encoding = "iso-8859-1") print (df) CALENDAR_DATE ORDER_NUMBER INVOICE_NUMBER TRANSACTION_TYPE \ 0 5/13/2016 0:00 13867666 6892372 S CUSTOMER_NUMBER CUSTOMER_NAME 0 2026 CUSTOMER 1 

If the tabulator separator, use sep='\t' .

EDIT:

I am testing it with your data and it works:

 import pandas as pd df = pd.read_csv('test/AnonymizedData.txt', sep='\t') 
 print (df) CUSTOMER_NUMBER CUSTOMER_NAME CUSTOMER_BRANCH_CODE CUSTOMER_BRANCH_NAME \ 0 2026 CUSTOMER 1 83 SALES BRANCH 1 1 2359 CUSTOMER 2 76 SALES BRANCH 2 2 100662 CUSTOMER 3 28 SALES BRANCH 3 3 3245 CUSTOMER 4 84 SALES BRANCH 4 4 3179 CUSTOMER 5 28 SALES BRANCH 5 5 39881 CUSTOMER 6 67 SALES BRANCH 6 6 37020 CUSTOMER 7 58 SALES BRANCH 7 7 1239 CUSTOMER 8 50 SALES BRANCH 8 8 2379 CUSTOMER 9 76 SALES BRANCH 9 CUSTOMER_CITY CUSTOMER_STATE ... PRICING_PRODUCT_TYPE_CODE \ 0 TOWN 1 CO ... 11 1 TOWN 2 OH ... 11 2 TOWN 3 ME ... 11 3 TOWN 4 IL ... 11 4 TOWN 5 NH ... 11 5 TOWN 6 TX ... 11 6 TOWN 7 NC ... 11 7 TOWN 8 NY ... 11 8 TOWN 9 OH ... 11 PRICING_PRODUCT_TYPE ORGANIZATION_ID ORGANIZATION_NAME PRODUCT_LINE_CODE \ 0 DISPOSABLES 83 ORGANIZATIONNAME 891 1 DISPOSABLES 83 ORGANIZATIONNAME 891 2 DISPOSABLES 83 ORGANIZATIONNAME 891 3 DISPOSABLES 83 ORGANIZATIONNAME 891 4 DISPOSABLES 83 ORGANIZATIONNAME 891 5 DISPOSABLES 83 ORGANIZATIONNAME 891 6 DISPOSABLES 83 ORGANIZATIONNAME 891 7 DISPOSABLES 83 ORGANIZATIONNAME 891 8 DISPOSABLES 83 ORGANIZATIONNAME 891 PRODUCT_LINE ROBOTIC_FLAG Unnamed: 52 Unnamed: 53 Unnamed: 54 0 PRODUCTNAME NN NaN 3 1 PRODUCTNAME NN NaN 3 2 PRODUCTNAME NN NaN 2 3 PRODUCTNAME NN NaN 7 4 PRODUCTNAME NN NaN 1 5 PRODUCTNAME NN NaN 4 6 PRODUCTNAME NN NaN 3 7 PRODUCTNAME NN NaN 5 8 PRODUCTNAME NN NaN 3 [9 rows x 55 columns] 
+4
source

All Articles