I'm still pretty new to using python for programs from scratch, so I would take a file that I process with SQL to duplicate functions with Python. It seems that I want to take the (compressed, zip) csv file and create a Dict from it (or maybe a dict of dicts?). When I use a dict reader, do I get the first row as a key, and not each column as my own key? For instance.
import csv, sys, zipfile sys.argv[0] = "/home/tom/Documents/REdata/AllListing1RES.zip" zip_file = zipfile.ZipFile(sys.argv[0]) items_file = zip_file.open('AllListing1RES.txt', 'rU') for row in csv.DictReader(items_file,dialect='excel'): pass
Productivity:
>>> for key in row: print 'key=%s, value=%s' % (key, row[key]) key=MLS_ACCT PARCEL_ID AREA COUNTY STREET_NUM STREET_NAME CITY ZIP STATUS PROP_TYPE LIST_PRICE LIST_DATE DOM DATE_MODIFIED BATHS_HALF BATHS_FULL BEDROOMS ACREAGE YEAR_BUILT YEAR_BUILT_DESC OWNER_NAME SOLD_DATE WITHDRAWN_DATE STATUS_DATE SUBDIVISION PENDING_DATE SOLD_PRICE, value=492859 28-15-3-009-001.0000 200 JEFF 3828 ORLEANS RD MOUNTAIN BROOK 35243 A SFR 324900 3/3/2011 2 3/4/2011 12:04:11 AM 0 2 3 0 1968 EXIST SPARKS 3/3/2011 11:54:56 PM KNOLLWOOD CITY ZIP STATUS PROP_TYPE LIST_PRICE LIST_DATE DOM DATE_MODIFIED BATHS_HALF BATHS_FULL BEDROOMS ACREAGE YEAR_BUILT YEAR_BUILT_DESC OWNER_NAME SOLD_DATE WITHDRAWN_DATE STATUS_DATE SUBDIVISION PENDING_DATE SOLD_PRICE, >>> for key in row: print 'key=%s, value=%s' % (key, row[key]) key=MLS_ACCT PARCEL_ID AREA COUNTY STREET_NUM STREET_NAME CITY ZIP STATUS PROP_TYPE LIST_PRICE LIST_DATE DOM DATE_MODIFIED BATHS_HALF BATHS_FULL BEDROOMS ACREAGE YEAR_BUILT YEAR_BUILT_DESC OWNER_NAME SOLD_DATE WITHDRAWN_DATE STATUS_DATE SUBDIVISION PENDING_DATE SOLD_PRICE, value=492859 28-15-3-009-001.0000 200 JEFF 3828 ORLEANS RD MOUNTAIN BROOK 35243 A SFR 324900 3/3/2011 2 3/4/2011 12:04:11 AM 0 2 3 0 1968 EXIST SPARKS 3/3/2011 11:54:56 PM KNOLLWOOD
So, I'm looking for a column for MLS_ACCT and a separate one for PARCEL_ID , etc., so I can do things like average prices for all elements that contain KNOLLWOOD in the SUBDIVISION field with an additional subsection by date, sale date, etc.
I know well how to do this with SQL, but as I said, I am contacting to get some Python skills here. I read the last few days, but have not yet found very simple illustrations in this case. Pointers to the mentioned documents would be appreciated. I understand that I can use resident SQL-Lite, but again, my desire is to get an approach to Python. I read some of them about Numpy and Scipy and downloaded the sage, but still can not find useful illustrations, as these tools focus on arrays with numbers only as elements, and I have a lot of string mappings that I need to do as well date calculations and comparisons.
In the end, I will need to replace the values โโin the table (since I have dirty data), I am doing it now, having a "translation table" that contains all the dirty options, and provides a "clean" answer for final use.