If you have plain text, you can split the text by \n to get a string and split the strings with, to get separate fields:
>>> s = """1, First Street, City, X13 ... 1, First Street First Street, City, X13 ... 1 1, First Street, City, X13 X13""" >>> >>> lines = s.split('\n') >>> >>> splitted_lines = [line.split(',') for line in lines]
Note that as a more pythonic method, you can use the csv module to read your text, specifying a comma as a separator.
import csv with open('file_name') as f: splitted_lines = csv.reader(f,delimiter=',')
You can then use the following list comprehension to get unique fields in each column:
>>> import re >>> ' '.join([set([set(re.split(r'\s{2,}',i)).pop() for i in column]).pop() for column in zip(*splitted_lines)]) '1 First Street City'
Note that here you can get the columns using the zip() function, and then split the elements with re.split() with regex r'\s{2,}' , which break your line into 2 or more white spaces then you can sue set() to save unique items.
Note. If you care about ordering, you can use collections.OrderedDict instead of set
>>> from collections import OrderedDict >>> >>> d = OrderedDict() >>> ' '.join([d.fromkeys([set(re.split('\s{2,}',i)).pop() for i in column]).keys()[0] for column in zip(*splitted_lines)]) '1 First Street City X13'