Create a pandas DataFrame from multiple dicts

Question

Create a pandas DataFrame from multiple dicts

I am new to pandas and my first question is about stackoverflow, I am trying to do some analytics with pandas.

I have text files with data records that I want to process. Each line of the file corresponds to a record whose fields are in a fixed place and have a length of a fixed number of characters. There are different records in one file, all records share the first field, which consists of two characters depending on the type of record. As an example:

Some file: 01Jhon Smith 555-1234 03Cow Bos primigenius taurus 00401 01Jannette Jhonson 00100000000 ... field start length type 1 2 *common to all records, example: 01 = person, 03 = animal name 3 10 surname 13 10 phone 23 8 credit 31 11 fill of spaces

I am writing code to convert a single entry into a dictionary:

 person1 = {'type': 01, 'name': = 'Jhon', 'surname': = 'Smith', 'phone': '555-1234'} person2 = {'type': 01, 'name': 'Jannette', 'surname': 'Jhonson', 'credit': 1000000.00} animal1 = {'type': 03, 'cname': 'cow', 'sciname': 'Bos....', 'legs': 4, 'tails': 1 }

If the field is empty (filled with spaces), it will not be in the dictionary).

With all the records of the same kind, I want to create a pandas DataFrame with dicts keys as the column names, I tried with pandas.DataFrame.from_dict () without success.

And so my question is: is there a way to do this with pandas, so that the dict keys become columns? Is there any other standard method for working with such files?

+7

pandas

tinproject Jul 19 '13 at 17:00

source share

1 answer

DSM · Accepted Answer · 2013-07-19T17:35:43+0000

To make a DataFrame from a dictionary, you can pass a list of dictionaries:

 >>> person1 = {'type': 01, 'name': 'Jhon', 'surname': 'Smith', 'phone': '555-1234'} >>> person2 = {'type': 01, 'name': 'Jannette', 'surname': 'Jhonson', 'credit': 1000000.00} >>> animal1 = {'type': 03, 'cname': 'cow', 'sciname': 'Bos....', 'legs': 4, 'tails': 1 } >>> pd.DataFrame([person1]) name phone surname type 0 Jhon 555-1234 Smith 1 >>> pd.DataFrame([person1, person2]) credit name phone surname type 0 NaN Jhon 555-1234 Smith 1 1 1000000 Jannette NaN Jhonson 1 >>> pd.DataFrame.from_dict([person1, person2]) credit name phone surname type 0 NaN Jhon 555-1234 Smith 1 1 1000000 Jannette NaN Jhonson 1

For the more fundamental problem associated with two different file formats, mixed and assuming that the files are not so large that we cannot read them and store them in memory, I would use StringIO to create an object that is a kind of file, but which has only the lines we want, then use read_fwf (fixed-width-file). For example:

 from StringIO import StringIO def get_filelike_object(filename, line_prefix): s = StringIO() with open(filename, "r") as fp: for line in fp: if line.startswith(line_prefix): s.write(line) s.seek(0) return s

and then

 >>> type01 = get_filelike_object("animal.dat", "01") >>> df = pd.read_fwf(type01, names="type name surname phone credit".split(), widths=[2, 10, 10, 8, 11], header=None) >>> df type name surname phone credit 0 1 Jhon Smith 555-1234 NaN 1 1 Jannette Jhonson NaN 100000000

must work. Of course, you can also split files into different types before pandas ever sees them, which might be the easiest.

Create a pandas DataFrame from multiple dicts

More articles: