How to create a JSON file with nested records from a flat data table?

Question

How to create a JSON file with nested records from a flat data table?

I am looking for a Python technique for assembling a JSON file attachment from a flat table in a pandas data frame. For example, how can there be a pandas frame data table, such as:

teamname member firstname lastname orgname phone mobile 0 1 0 John Doe Anon 916-555-1234 1 1 1 Jane Doe Anon 916-555-4321 916-555-7890 2 2 0 Mickey Moose Moosers 916-555-0000 916-555-1111 3 2 1 Minny Moose Moosers 916-555-2222

taken and exported to JSON, which looks like this:

 { "teams": [ { "teamname": "1", "members": [ { "firstname": "John", "lastname": "Doe", "orgname": "Anon", "phone": "916-555-1234", "mobile": "", }, { "firstname": "Jane", "lastname": "Doe", "orgname": "Anon", "phone": "916-555-4321", "mobile": "916-555-7890", } ] }, { "teamname": "2", "members": [ { "firstname": "Mickey", "lastname": "Moose", "orgname": "Moosers", "phone": "916-555-0000", "mobile": "916-555-1111", }, { "firstname": "Minny", "lastname": "Moose", "orgname": "Moosers", "phone": "916-555-2222", "mobile": "", } ] } ] }

I tried to do this by creating a dictation of dictations and dropping JSON. This is my current code:

 data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8') memberDictTuple = [] for index, row in data.iterrows(): dataRow = row rowDict = dict(zip(columnList[2:], dataRow[2:])) teamRowDict = {columnList[0]:int(dataRow[0])} memberId = tuple(row[1:2]) memberId = memberId[0] teamName = tuple(row[0:1]) teamName = teamName[0] memberDict1 = {int(memberId):rowDict} memberDict2 = {int(teamName):memberDict1} memberDictTuple.append(memberDict2) memberDictTuple = tuple(memberDictTuple) formattedJson = json.dumps(memberDictTuple, indent = 4, sort_keys = True) print formattedJson

It produces the following result. Each item is nested at the correct level under "teamname" 1 or 2, but entries must be inserted together if they have the same team name. How can I fix this so that the name of team 1 and the name of team 2 have 2 entries nested inside?

 [ { "1": { "0": { "email": " john.doe@wildlife.net ", "firstname": "John", "lastname": "Doe", "mobile": "none", "orgname": "Anon", "phone": "916-555-1234" } } }, { "1": { "1": { "email": " jane.doe@wildlife.net ", "firstname": "Jane", "lastname": "Doe", "mobile": "916-555-7890", "orgname": "Anon", "phone": "916-555-4321" } } }, { "2": { "0": { "email": " mickey.moose@wildlife.net ", "firstname": "Mickey", "lastname": "Moose", "mobile": "916-555-1111", "orgname": "Moosers", "phone": "916-555-0000" } } }, { "2": { "1": { "email": " minny.moose@wildlife.net ", "firstname": "Minny", "lastname": "Moose", "mobile": "none", "orgname": "Moosers", "phone": "916-555-2222" } } } ]

+7

json python pandas nested

spaine Jun 08 '16 at 21:34

source share

2 answers

Using some input from @root, I used a different stickiness and came up with the following code, which seems to have gotten most of the way:

 import pandas import json from collections import defaultdict inputExcel = 'E:\\teamsMM.xlsx' exportJson = 'E:\\teamsMM.json' data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8') grouped = data.groupby(['teamname', 'members']).first() results = defaultdict(lambda: defaultdict(dict)) for t in grouped.itertuples(): for i, key in enumerate(t.Index): if i ==0: nested = results[key] elif i == len(t.Index) -1: nested[key] = t else: nested = nested[key] formattedJson = json.dumps(results, indent = 4) formattedJson = '{\n"teams": [\n' + formattedJson +'\n]\n }' parsed = open(exportJson, "w") parsed.write(formattedJson)

The result is a JSON file:

 { "teams": [ { "1": { "0": [ [ 1, 0 ], "John", "Doe", "Anon", "916-555-1234", "none", " john.doe@wildlife.net " ], "1": [ [ 1, 1 ], "Jane", "Doe", "Anon", "916-555-4321", "916-555-7890", " jane.doe@wildlife.net " ] }, "2": { "0": [ [ 2, 0 ], "Mickey", "Moose", "Moosers", "916-555-0000", "916-555-1111", " mickey.moose@wildlife.net " ], "1": [ [ 2, 1 ], "Minny", "Moose", "Moosers", "916-555-2222", "none", " minny.moose@wildlife.net " ] } } ] }

This format is very close to the desired end product. Other problems: removing the redundant array [1, 0], which appears just above each first name, and getting headers for each slot "teamname": "1", "members": instead of "1": "0":

In addition, I do not know why each entry loses its title during conversion. For example, why the dictionary entry "firstname": "John" is exported as "John".

0

spaine Jun 14 '16 at 9:38

source share

spaine · Accepted Answer · 2016-06-20T16:17:24+0000

This is a solution that works and creates the desired JSON format. First, I grouped my data file using the appropriate columns, instead of creating a dictionary (and losing data order) for each column header / record pair, I created them as lists of tuples, and then converted the list into an ordered dict. Another custom dict was created for two columns in which everything else was grouped. Accurate stratification between lists and ordered voice recorders was necessary to convert JSON to get the correct format. Also note that when dumping to JSON, sort_keys must be set to false, or all your ordered dictations will be rearranged in alphabetical order.

 import pandas import json from collections import OrderedDict inputExcel = 'E:\\teams.xlsx' exportJson = 'E:\\teams.json' data = pandas.read_excel(inputExcel, sheetname = 'SCAT Teams', encoding = 'utf8') # This creates a tuple of column headings for later use matching them with column data cols = [] columnList = list(data[0:]) for col in columnList: cols.append(str(col)) columnList = tuple(cols) #This groups the dataframe by the 'teamname' and 'members' columns grouped = data.groupby(['teamname', 'members']).first() #This creates a reference to the index level of the groups groupnames = data.groupby(["teamname", "members"]).grouper.levels tm = (groupnames[0]) #Create a list to add team records to at the end of the first 'for' loop teamsList = [] for teamN in tm: teamN = int(teamN) #added this in to prevent TypeError: 1 is not JSON serializable tempList = [] #Create an temporary list to add each record to for index, row in grouped.iterrows(): dataRow = row if index[0] == teamN: #Select the record in each row of the grouped dataframe if its index matches the team number #In order to have the JSON records come out in the same order, I had to first create a list of tuples, then convert to and Ordered Dict rowDict = ([(columnList[2], dataRow[0]), (columnList[3], dataRow[1]), (columnList[4], dataRow[2]), (columnList[5], dataRow[3]), (columnList[6], dataRow[4]), (columnList[7], dataRow[5])]) rowDict = OrderedDict(rowDict) tempList.append(rowDict) #Create another Ordered Dict to keep 'teamname' and the list of members from the temporary list sorted t = ([('teamname', str(teamN)), ('members', tempList)]) t= OrderedDict(t) #Append the Ordered Dict to the emepty list of teams created earlier ListX = t teamsList.append(ListX) #Create a final dictionary with a single item: the list of teams teams = {"teams":teamsList} #Dump to JSON format formattedJson = json.dumps(teams, indent = 1, sort_keys = False) #sort_keys MUST be set to False, or all dictionaries will be alphebetized formattedJson = formattedJson.replace("NaN", '"NULL"') #"NaN" is the NULL format in pandas dataframes - must be replaced with "NULL" to be a valid JSON file print formattedJson #Export to JSON file parsed = open(exportJson, "w") parsed.write(formattedJson) print"\n\nExport to JSON Complete"

How to create a JSON file with nested records from a flat data table?

More articles: