Use type type () to return values ​​stored as strings

In my application, I created several values ​​(three columns of type int, str and datetime, see the example below), and these values ​​are stored in a flat file in the form of comma-separated lines. In addition, I save a file containing the type of values ​​(see below). Now, how can I use this information to pass my values ​​from a flat file to the correct data type in Python? Is it possible for me or do I need to do some other things?

Data file:

#id,value,date 1,a,2011-09-13 15:00:00 2,b,2011-09-13 15:10:00 3,c,2011-09-13 15:20:00 4,d,2011-09-13 15:30:00 

File type:

 id,<type 'int'> value,<type 'str'> date,<type 'datetime.datetime'> 
+4
source share
7 answers

As I understand it, you have already analyzed the file, now you just need to get the correct type. So let's say id_ , type_ and value are three lines containing the values ​​in the file. (Note: type_ must contain 'int' - for example, -, and not '<type 'int'>' .

 def convert(value, type_): import importlib try: # Check if it a builtin type module = importlib.import_module('__builtin__') cls = getattr(module, type_) except AttributeError: # if not, separate module and class module, type_ = type_.rsplit(".", 1) module = importlib.import_module(module) cls = getattr(module, type_) return cls(value) 

Then you can use it as ..:

 value = convert("5", "int") 

Unfortunately, for datetime, this does not work, since it cannot be simply initialized with its string representation.

+2
source

Follow these steps:

  • Read the file line by line, for each line follow these steps
  • Split the string using split() with , as a separator.
  • Transfer the first element of the list (starting from step 2) as int. Save the second item as a string. Separate the third value (eg using slices) and create a datetime object of the same.
+1
source

I had to deal with a similar situation in a recent program that was supposed to convert many fields. I used a list of tuples, where one element of the tuples was the conversion function. Sometimes it was an int or float ; sometimes it was just lambda ; and sometimes it was the name of a function defined elsewhere.

+1
source

Your file type might be simpler:

 id=int value=str date=datetime.datetime 

Then in your main program you can

 import datetime def convert_datetime(text): return datetime.datetime.strptime(text, "%Y-%m-%d %H:%M:%S") data_types = {'int':int, 'str':str, 'datetime.datetime':convert_datetime} fields = {} for line in open('example_types.txt').readlines(): key, val = line.strip().split('=') fields[key] = val data_file = open('actual_data.txt') field_info = data_file.readline().strip('#\n ').split(',') values = [] #store it all here for now for line in data_file.readlines(): row = [] for i, element in enumerate(line.strip().split(',')): element_type = fields[field_info[i]] # will get 'int', 'str', or 'datetime' convert = data_types[element_type] row.append(convert(element)) values.append(row) # to show it working... for row in values: print row 
+1
source

Instead of having a separate type file, take your list of tuples (id, value, date) and just pickle it.

Or you will have to solve the problem of storing your string-to-type converters as text (in your type file), which may be an interesting problem to solve, but if you are just trying to get something done, go pickle or cPickle

0
source

First, you cannot write a “universal” or “smart” transformation that magically handles everything.

Secondly, trying to summarize the conversion of strings to data into anything other than code does not seem to work well. So instead of writing a line that calls the conversion, just write the conversion.

Finally, trying to write a configuration file in a domain-specific language is stupid. Just write Python code. This is not much more complicated than trying to parse any configuration file.

Is it possible or do I need to do some other things?

Do not waste time trying to create a "type file" that is not just Python. It does not help. It’s easier to write a transformation as a function of Python. You can import this function as if it were your "type file".

 import datetime def convert( row ): return dict( id= int(row['id']), value= str(row['value']), date= datetime.datetime.strptime(row['date],"%Y-%m-%d %H:%M:%S"), ) 

That's all you have in your "type file"

Now you can read (and process) your input this way.

  from type_file import convert import csv with open( "date", "rb" ) as source: rdr= csv.DictReader( source ) for row in rdr: useful_row= convert( row ) 

in many cases I don't know the number of columns or data type until runtime

This means that you are doomed.

You must have an actual definition of the contents of the file or you cannot perform any processing.

 "id","value","other value" 1,23507,3 

You don’t know if "23507" should be an integer, a string, a zip code or a floating point (which missed the period), duration (in days or seconds) or any other more complicated thing, you cannot hope, and you don’t you can guess.

After receiving the definition, you need to write an explicit conversion function based on the actual definition.

After recording the conversion, you need to (a) test the conversion with a simple unit test and (b) check the data to make sure that it is actually being converted.

Then you can process the file.

0
source

You can look at the xlrd module. If you can upload your data to excel and it knows what type is associated with each column, xlrd will give you the type when reading the excel file. Of course, if the data is given to you as csv, then someone will have to go to the excel file and manually change the column types.

Not sure if you get to where you want, but it can help.

0
source

All Articles