View JSON nested data in pandas framework

Now I have added the current issue on GitHib. Enter the URL for the repo. I turned on the Jupyter laptop, which also explains the problem. Thanks guys.

https://github.com/simongraham/dataExplore.git


I am currently working with power data for a project where the data is in raw JSON format, and I want to use python and pandas to get a clear data frame. I understand that this is not an easy task when JSON is not nested. Here I would use:

nutrition = pd.read_json('data') 

However, I have embedded information, and it is very difficult for me to get it in a reasonable data format. The JSON format is as follows: The Nutrients battery is itself a nested element. The nest for this element will describe the nutritional content for a wide variety of things, such as alcohol and bcfa, as indicated. I only included the sample because it is a large data file.

  [ { "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd", "vcNutritionId": "2476378b-79ee-4857-a81d-489661a039a1", "vcUserId": "cc51145b-5a70-4344-9b55-1a4455f0a9d2", "vcPortionId": "1", "vcPortionName": "1 average pepper", "vcPortionSize": "20", "ftEnergyKcal": 5.2, "vcPortionUnit": "g", "dtConsumedDate": "2016-05-04T00:00:00", "nutritionNutrients": [ { "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd", "vcNutrient": "alcohol", "ftValue": 0, "vcUnit": "g", "nPercentRI": 0, "vcTrafficLight": "" }, { "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd", "vcNutrient": "bcfa", "ftValue": 0, "vcUnit": "g", "nPercentRI": 0, "vcTrafficLight": "" }, { "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd", "vcNutrient": "biotin", "ftValue": 0, "vcUnit": "ยตg", "nPercentRI": 0, "vcTrafficLight": "" }, ... ] } ] 

Any help would be appreciated.

Thanks.

.... .... ....

Now that I have figured out how to solve this problem with json_normalize, I am returning the same problem, but this time my code is nested twice. I.e:

 [ { ... } [, "nutritionPortions": [ { "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd", "vcNutritionId": "2476378b-79ee-4857-a81d-489661a039a1", "vcUserId": "cc51145b-5a70-4344-9b55-1a4455f0a9d2", "vcPortionId": "1", "vcPortionName": "1 average pepper", "vcPortionSize": "20", "ftEnergyKcal": 5.2, "vcPortionUnit": "g", "dtConsumedDate": "2016-05-04T00:00:00", "nutritionNutrients": [ { "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd", "vcNutrient": "alcohol", "ftValue": 0, "vcUnit": "g", "nPercentRI": 0, "vcTrafficLight": "" }, { "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd", "vcNutrient": "bcfa", "ftValue": 0, "vcUnit": "g", "nPercentRI": 0, "vcTrafficLight": "" }, { "vcNutritionPortionId": "478d1905-f264-4d9b-ab76-0ed4252193fd", "vcNutrient": "biotin", "ftValue": 0, "vcUnit": "ยตg", "nPercentRI": 0, "vcTrafficLight": "" }, ... } ] } ] 

When I have a JSON consisting of only power data, I can use:

 nutrition = (pd.io .json .json_normalize((data, ['nutritionPortions']), 'nutritionNutrients', ['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize', 'ftEnergyKcal','vcPortionUnit','dtConsumedDate']) ) 

However, my data not only contains nutrition information. For example, it will contain activity information, and therefore, nutrition information will first be nested with nutrtitionPortions. Suppose all other columns are not nested, and they are represented by "Activity" and "Wellbeing".

If I use the code:

 nutrition = (pd.io .json .json_normalize(data, ['nutritionPortions']) ) 

I will return to the original problem, where the "nutritionNutrients" is nested, but I do not have time to get the corresponding data frame.

thanks

+5
source share
1 answer

UPDATE: this should work for your kaidoData.json file:

 df = (pd.io .json .json_normalize(data[0]['ionPortions'], 'nutritionNutrients', ['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize', 'dtCreatedDate','dtUpdatedDate','nProcessingStatus', 'vcPortionUnit','dtConsumedDate' ] ) ) 

PS I do not know what is wrong with "ftEnergyKcal" - it throws me:

KeyError: 'ftEnergyKcal'

perhaps it is missing in some sections

OLD answer:

use json_normalize () :

 (pd.io .json .json_normalize(l, 'nutritionNutrients', ['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize', 'ftEnergyKcal','vcPortionUnit','dtConsumedDate']) ) 

demo:

 In [107]: (pd.io .....: .json .....: .json_normalize(l, 'nutritionNutrients', .....: ['vcNutritionId','vcUserId','vcPortionId','vcPortionName','vcPortionSize', .....: 'ftEnergyKcal','vcPortionUnit','dtConsumedDate']) .....: ) Out[107]: ftValue nPercentRI vcNutrient vcNutritionPortionId vcTrafficLight ... vcPortionSize \ 0 0 0 alcohol 478d1905-f264-4d... ... 20 1 0 0 bcfa 478d1905-f264-4d... ... 20 2 0 0 biotin 478d1905-f264-4d... ... 20 vcNutritionId vcPortionId ftEnergyKcal vcPortionName 0 2476378b-79ee-48... 1 5.2 1 average pepper 1 2476378b-79ee-48... 1 5.2 1 average pepper 2 2476378b-79ee-48... 1 5.2 1 average pepper [3 rows x 14 columns] 

where l is your list (parsed by JSON)

+4
source

All Articles