How to convert complex JSON data into a single data file?

I do not know how to properly convert JSON data to a useful frame. These are some examples of data that show the structure of my data:

{ "data":[ {"track":[ {"time":"2015","midpoint":{"x":6,"y":8},"realworld":{"x":1,"y":3},"coordinate":{"x":16,"y":38}}, {"time":"2015","midpoint":{"x":6,"y":8},"realworld":{"x":1,"y":3},"coordinate":{"x":16,"y":37}}, {"time":"2016","midpoint":{"x":6,"y":9},"realworld":{"x":2,"y":3},"coordinate":{"x":16,"y":38}} ]}, {"track":[ {"time":"2015","midpoint":{"x":5,"y":9},"realworld":{"x":-1,"y":3},"coordinate":{"x":16,"y":38}}, {"time":"2015","midpoint":{"x":5,"y":9},"realworld":{"x":-1,"y":3},"coordinate":{"x":16,"y":38}}, {"time":"2016","midpoint":{"x":5,"y":9},"realworld":{"x":-1,"y":3},"coordinate":{"x":16,"y":38}}, {"time":"2015","midpoint":{"x":3,"y":15},"realworld":{"x":-9,"y":2},"coordinate":{"x":17,"y":38}} ]}, {"track":[ {"time":"2015","midpoint":{"x":6,"y":7},"realworld":{"x":-2,"y":3},"coordinate":{"x":16,"y":39}} ]}]} 

I have many tracks, and I would like the data set to look like this:

 track time midpoint realworld coordinate 1 1 1 2 2 2 2 3 

So far I have this:

 json_file <- "testdata.json" data <- fromJSON(json_file) data2 <- list.stack(data, fill=TRUE) 

Now it looks like this:

data output

How can I get this in the correct format?

+7
json r
source share
2 answers

Add the flatten = TRUE parameter when reading with fromJSON . This will give you a nested list in which at the deepest level a list of three data frames will be presented. Using:

 library(jsonlite) # read the json jsondata <- fromJSON(txt, flatten = TRUE) # bind the dataframes in the nested 'track' list together dat <- do.call(rbind, jsondata$data$track) # add a track variable dat$track <- rep(1:length(jsondata$data$track), sapply(jsondata$data$track, nrow)) 

gives:

 > dat time midpoint.x midpoint.y realworld.x realworld.y coordinate.x coordinate.y track 1 2015 6 8 1 3 16 38 1 2 2015 6 8 1 3 16 37 1 3 2016 6 9 2 3 16 38 1 4 2015 5 9 -1 3 16 38 2 5 2015 5 9 -1 3 16 38 2 6 2016 5 9 -1 3 16 38 2 7 2015 3 15 -9 2 17 38 2 8 2015 6 7 -2 3 16 39 3 

Another, shorter approach uses jsonlite in combination with rbindlist from the data.table package:

 library(jsonlite) library(data.table) # read the json jsondata <- fromJSON(txt, flatten = TRUE) # bind the dataframes in the nested 'track' list together # and include an id-column at the same time dat <- rbindlist(jsondata$data$track, idcol = 'track') 

or bind_rows from the bind_rows package dplyr similar way:

 library(dplyr) dat <- bind_rows(jsondata$data$track, .id = 'track') 

Used data:

 txt <- '{ "data":[ {"track":[ {"time":"2015","midpoint":{"x":6,"y":8},"realworld":{"x":1,"y":3},"coordinate":{"x":16,"y":38}}, {"time":"2015","midpoint":{"x":6,"y":8},"realworld":{"x":1,"y":3},"coordinate":{"x":16,"y":37}}, {"time":"2016","midpoint":{"x":6,"y":9},"realworld":{"x":2,"y":3},"coordinate":{"x":16,"y":38}} ]}, {"track":[ {"time":"2015","midpoint":{"x":5,"y":9},"realworld":{"x":-1,"y":3},"coordinate":{"x":16,"y":38}}, {"time":"2015","midpoint":{"x":5,"y":9},"realworld":{"x":-1,"y":3},"coordinate":{"x":16,"y":38}}, {"time":"2016","midpoint":{"x":5,"y":9},"realworld":{"x":-1,"y":3},"coordinate":{"x":16,"y":38}}, {"time":"2015","midpoint":{"x":3,"y":15},"realworld":{"x":-9,"y":2},"coordinate":{"x":17,"y":38}} ]}, {"track":[ {"time":"2015","midpoint":{"x":6,"y":7},"realworld":{"x":-2,"y":3},"coordinate":{"x":16,"y":39}} ]}]}' 
+4
source share

Sahil's answer (if it is not already deleted) is misleading since stream_in is for ndjson and you don't have ndjson. You just need to corrupt the nested list a bit. I think the following can be done even less, but it was a quick hacker with a direct attack:

 library(jsonlite) library(purrr) library(readr) dat <- fromJSON(txt, simplifyVector=FALSE) # read in your JSON map(dat$data, "track") %>% # move past the top-level "data" element and iterate over the "track"s map_df(function(track) { # iterate over each element of "track" map_df(track, ~as.list(unlist(track))) # convert it to a data frame }, .id="track") %>% # add in the track "id" type_convert() # convert mangled types ## # A tibble: 8 Γ— 8 ## track time midpoint.x midpoint.y realworld.x realworld.y coordinate.x coordinate.y ## <int> <int> <int> <int> <int> <int> <int> <int> ## 1 1 2016 6 9 2 3 16 38 ## 2 1 2016 6 9 2 3 16 38 ## 3 1 2016 6 9 2 3 16 38 ## 4 2 2015 3 15 -9 2 17 38 ## 5 2 2015 3 15 -9 2 17 38 ## 6 2 2015 3 15 -9 2 17 38 ## 7 2 2015 3 15 -9 2 17 38 ## 8 3 2015 6 7 -2 3 16 39 

This also leaves you with decent column types, although you can use the col_types parameter for readr::type_convert to turn time into a character vector.

As an alternative:

 library(jsonlite) library(purrr) library(tibble) dat <- fromJSON(txt, flatten=TRUE) # read in your JSON map_df(dat$data$track, as_tibble, .id="track") 
+2
source share

All Articles