I am uploading a CSV file from S3 to Redshift. This CSV file is analytic data that contains the PageUrl file (which may contain information about the user's search inside the query string, for example).
It pinches lines where there is one double-quoted character, for example, if there is a page for a 14-inch toy, then the Page url page will contain:
http://www.mywebsite.com/a-14 "-toy / 1234.html
Redshift, for obvious reasons, cannot handle this because it expects a double double quote character.
As I can see, my options are:
- Pre-process input and delete these characters
- Configure the COPY command in Redshift to ignore these characters but still load the string
- Set MAXERRORS to a high value and execute the errors using a separate process.
Option 2 would be perfect, but I can't find it!
Any other suggestions if I just don't look complicated enough?
thanks
Duncan
source share