Loading JSON data in AWS Redshift results in NULL values

Question

Loading JSON data in AWS Redshift results in NULL values

I am trying to perform a load / copy operation to import data from JSON files into an S3 bucket directly in Redshift. COPY operation succeeds, and after COPY the table has the correct number of rows / records, but each record is NULL!

It takes the expected amount of time to load, the COPY command returns OK, the Redshift console reports successful results and no errors ... but if I execute a simple query from a table, it returns only NULL values.

JSON is very simple + flat and formatted correctly (according to the examples I found here: http://docs.aws.amazon.com/redshift/latest/dg/r_COPY_command_examples.html )

Basically, this is one line per line, formatted as:

{ "col1": "val1", "col2": "val2", ... } { "col1": "val1", "col2": "val2", ... } { "col1": "val1", "col2": "val2", ... }

I tried things like rewriting a schema based on values and data types found in JSON objects, as well as copying from uncompressed files. I thought that maybe JSON understood correctly on loading, but it should probably cause an error if the objects cannot be parsed.

My COPY command looks like this:

 copy events from 's3://mybucket/json/prefix' with credentials 'aws_access_key_id=xxx;aws_secret_access_key=xxx' json 'auto' gzip;

Any guidance would be appreciated! Thanks.

+11

amazon-web-services amazon-redshift

shane Jun 30 '15 at 1:24

source share

3 answers

In cases where JSON data objects do not match column names, you can use the JSONPaths file to map JSON elements to columns, as mentioned in TimZ, and described here

0

Dummy Nov 07 '17 at 21:27

source share

COPY maps the data elements in the source JSON data to the columns in the target table by matching the object keys or names in the source of the name / value pair with the column names in the target table. compliance is case sensitive. Column names in Amazon Redshift tables are always lowercase, so when using the matching auto option corresponding to JSON, the field names must also be lowercase. If the JSON field name keys are not all lowercase, you can use the JSONPaths file to explicitly map the column names to the JSON field name keys.

The solution is to use jsonpath

Json example:

 { "Name": "Major", "Age": 19, "Add": { "street":{ "st":"5 maint st", "ci":"Dub" }, "city":"Dublin" }, "Category_Name": ["MLB","GBM"] }

Example table:

 ( name varchar, age int, address varchar, catname varchar );

Jsonpath example:

 { "jsonpaths": [ "$['Name']", "$['Age']", "$['Add']", "$['Category_Name']" ] }

Copy Code Example:

 copy customer --redshift code from 's3://mybucket/customer.json' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' json from 's3://mybucket/jpath.json' ; -- Jsonpath file to map fields

Examples taken from here

0

techkuz Oct 16 '19 at 10:35

source share

shane · Accepted Answer · 2015-07-01T03:57:38+0000

So, I found a reason - this would not be obvious from the description that I presented in my initial post.

When you create a table in Redshift, the column names are converted to lowercase. When you perform the COPY operation, the column names are case sensitive.

The input that I tried to load uses camelCase for column names, so when I do COPY, the columns do not match the specific schema (which now uses all the lowercase names)

However, the operation does not cause an error. It just leaves NULL in all columns that don't match (in this case, all of them)

Hope this helps someone avoid the same confusion!

Loading JSON data in AWS Redshift results in NULL values

More articles: