How to split into columns

I have a file with two datasets that I would like to read in Python as two columns.

The data is in the form of:

xxx yyy    xxx yyy   xxx yyy

etc., so I understand that I need to somehow separate it. I am new to Python (and relatively new to programming), so until now I have struggled a bit. At the moment, I tried to use:

def read(file):

    column1=[]
    column2=[]
    readfile = open(file, 'r')
    a = (readfile.read())
    readfile.close()

How would I like to split the reading in the file into columns1 and columns2?

+4
source share
3 answers

This is a simple example of getting xxx values ​​in column1 and yyy values ​​in column2.

Attention! Your file data should look something like this:

xxx yyy   xxx yyy   xxx yyy
4 (xxx yyy   xxx yyy) 1 (xxx yyy)


, , :

, /, /,  
data_separator=',' column_separator='/'

-/-/-  
data_separator='-' column_separator='/'

def read(file):
    column1=[]
    column2= []
    readfile = open(file, 'r')
    data_separator = ' '  # one space to separate xxx and yyy
    column_separator = '    '  # 4 spaces to separate groups xxx,yyy    xxx,yyy

    for line in readfile.readlines():  # In case you have more than 1 line
         line = line.rstrip('\n')  # Remove EOF from line
         print line

         columns = line.split(column_separator)  # Get the data groups 
         # columns now is an array like ['xxx yyy', 'xxx yyy', 'xxx yyy']

         for column in columns:
             if not column: continue  # If column is empty, ignore it
             column1.append(column.split(data_separator)[0])
             column2.append(column.split(data_separator)[1])
    readfile.close()

, xxx yyy aaa bbb ttt hhh , :

column1 = ['xxx', 'aaa', 'ttt']
column2 = ['yyy', 'bbb', 'hhh']
0

Python Pandas. , :

>cat data.txt
xxx  yyy  xxx  yyy  xxx yyy
xxx yyy    xxx yyy   xxx yyy
xxx yyy  xxx yyy   xxx yyy
xxx yyy    xxx yyy  xxx yyy
xxx yyy    xxx  yyy   xxx yyy

>from pandas import DataFrame
>from pandas import read_csv
>from pandas import concat
>dfin = read_csv("data.txt", header=None, prefix='X', delimiter=r"\s+")
> dfin
X0   X1   X2   X3   X4   X5
0  xxx  yyy  xxx  yyy  xxx  yyy
1  xxx  yyy  xxx  yyy  xxx  yyy
2  xxx  yyy  xxx  yyy  xxx  yyy
3  xxx  yyy  xxx  yyy  xxx  yyy
4  xxx  yyy  xxx  yyy  xxx  yyy
>dfout = DataFrame()
>dfout['X0'] = concat([dfin['X0'], dfin['X2'], dfin['X4']], axis=0, ignore_index=True)
>dfout['X1'] = concat([dfin['X1'], dfin['X3'], dfin['X5']], axis=0, ignore_index=True)
> dfout
 X0   X1
 0   xxx  yyy
 1   xxx  yyy
 2   xxx  yyy
 3   xxx  yyy
 4   xxx  yyy
 5   xxx  yyy
 6   xxx  yyy
 7   xxx  yyy
 8   xxx  yyy
 9   xxx  yyy
 10  xxx  yyy
 11  xxx  yyy
 12  xxx  yyy
 13  xxx  yyy
 14  xxx  yyy

, . .

0

... , ...

#reading a file seems not to be your problem ;)
#works also with more than 3/4/n spaces...
data = 'xxx yyy    xxx yyy             xxx yyy'

#reduce more than two spaces
while '   ' in data:
    data = data.replace('   ', '  ')

#split data-sets who are now separated trough two spaces
data = data.split('  ')

#split into cols for each data-set
data = [x.split(' ') for x in data]

#reshape for better (requested?) access
column1, column2 = zip(*data)

print column1
print column2

:

('xxx', 'xxx', 'xxx')
('yyy', 'yyy', 'yyy')

, :)

-1