Is it possible to detect bad quotes in a badly formed JSON string and then parse the string correctly as JSON?

I am using Rails 4.2.3. I am parsing JSON sent by a third party (I do not control how this JSON is formed). I noticed that they very rarely send bad JSON, like so

'{"DisplayName":""fat" Tony Elvis ","Time":null,"OverallRank":19,"AgeRank":4}' 

Note that the word β€œfat” with quotes twists the rest of the JSON. In my Rails code, I parse JSON, for example ...

  json_data = JSON.parse(content_str) 

Although I can catch errors when JSON does not parse properly, I wonder if there is a way to account for these poorly placed quotes, correct them so that the above line is not bad JSON and then parse the JSON properly.

+6
source share
7 answers

If you know exactly what flaws can occur, you can make some crazy workarounds, for example, use a regular expression to match and correct a string before parsing it like json:

 (?:")([^,:"]*"[^,:"]*"[^,:"]*)(?:") 

http://regexr.com/3dpj1

But this is definitely something that you should not do, if not absolutely necessary !! Better try contacting the owner of the source and make it escape the quotes correctly!

edit: Here is the full POC where non-exclusive quotes are simply removed: https://jsfiddle.net/MattDiMu/y8khwfw6/

+2
source

Using a regular expression, you can check before parsing double quotes \"\" , followed by some word \w+ and ending with \" . If you find that you use gsub to replace the phrase with single quotes and the reverse sign "\'\\1\' .

 t='{"DisplayName":""fat" Tony Elvis ","Time":null,"OverallRank":19,"AgeRank":4}' t=t.gsub(/\"\"(\w+)\"/, '"\'\\1\'') 
+1
source

I think you should make / have some assumptions about "json" that are always true. If, for example, json objects always have a fixed attribute order, this can help a lot, especially if individual attributes are problematic.

I would try to match

 {"DisplayName":"(.*?)","Time":(null|"[^"]*"),"OverallRank":(\d+),"AgeRank":(\d+)} 

and then replace it with some β€œfixed” function, which probably just uses capture groups and transcodes some kind of created ad-hoc object back into actual json. One option would be to expand (.*?) Only if something is wrong.

However, the whole approach is complicated with additional attributes and, moreover, with a flexible order of attributes (all of which can still be managed).

As you probably noticed, this only works if the assumption is on top. Depending on the assumptions you can make, the decision can be very simple. However, all this becomes cumbersome if these distorted elements are completely irregular. So ... lucky, I guess. Please post assumptions about what you think is true if you need more help. If not, the program should guess what it really meant. I mean, someone can mean:

 { "DisplayName":"I want to have a quotationmark followed by Time, all quoted and separated by a comma \",\"Time\":null, because that how I roll and this entry shall not have a Time attribute...", "OverallRank":2, "AgeRank":2 } 

if the quotes are not escaped correctly, you will have a problem. But, as I said, you have to make some assumptions about json. I mean, the usual assumption about json is that it is valid, because otherwise it is just not json.

+1
source

Try to handle the error using the initial handling of emergency exceptions, for example

 begin json_data = JSON.parse(content_str) rescue =>e Rails.logger.debug e end 

This will throw an exception when there is an invalid JSON format, and notify the source owner of the JSON change.

0
source

This is not an easy task. Mostly because writing a JSON parser is not trivial, and I doubt that you can adapt the parser to work the way you would like.

If I were absolutely forced to solve this problem programmatically (since this was due to a request from the provider to fix their JSON), I would probably do this using a branch.

Taking your JSON string example:
{"DisplayName":""fat" Tony Elvis ","Time":null,"OverallRank":19,"AgeRank":4}

First break the input into characters and iterate over them. Each time a recursion occurs in quotation marks and both possibilities are checked: the quote is part of the JSON, and the quote is part of the data.

Each time you find a quote, you will branch out, so after two quotes there will be four possible valid solutions, after four quotes there will be 16 possible solutions, etc.

As you do this, translate each possible solution into a JSON thread parser ( like this one ) and watch out for exceptions. If someone quits, then a possible solution will not work and throw it away. I will also throw it after depth 4 (or 8 if you expect double quotes in your data). By limiting the depth, you will also stop using solutions such as {"a\":\"b\", \"c"} .

In fact, creating this will take at least several hours, perhaps several days to do it right, and there is still a good chance that it will report false positives. It will also be slow, like a dog, because you have to parse potentially thousands of different JSON streams using Ruby, instead of parsing one of them using the C JSON library.

You can fix some performance issues by adding all possible solutions to the queue and use the workflow pool to get potential solutions and work on them; but now we say maybe a week of work to clear this data with a script.

0
source

As for bad quotes, this regex_pattern should be able to replace it with \" . Here's an example Rails snippet:

 regex_pattern = /(?<=[^\[{:,\\]|")"(?=[^:,\}\]])/ corrected_content_str = content_str.gsub(regex_pattern, '\\"') 

This template has the following rules:

  • A double quote must NOT have the following characters before: open a square bracket, open curly braces, a colon, a comma, a backslash, and a double quote. Therefore, (?<=[^\[{:,\\]|") .
  • The double quote must NOT have the following AFTER characters: a colon, a comma, a closing brace, and a closing square bracket. Therefore (?=[^:,\}\]]) .

http://rubular.com/r/YBfcJYCf6D

However, this does not eliminate unpaired quotes.

0
source

As mentioned in other posters, if your service does not provide you with valid JSON, then there is no way to make sure that you can read the data that they send to you. However, you can find some common cases and try to fix them.

If your JSON documents follow the pattern in your example, writing a small parser will help you try to read the wrong documents that match this.


Escape Double Quotes . This will cancel the ticks of your double quotes without returning, even if they are not balanced.

 invalid = '{"DisplayName":""fat" Tony" Elvis","Time":null,"OverallRank":19,"AgeRank":4}' # strip away { and } tailhead = invalid[1..-2] props = tailhead.split(/,(?=".+"\s*:)/) pairs = props.map {|p| p.split(/:(?=(?:".*"|\d+|null|false|true)$)/i)} escaped = pairs.map do |k,v| # is this a string property? string = v[/^"(.*?)"$/, 1] string ? [k, "\"#{string.gsub(/"/,'\\"')}\""] : [k,v] end valid = '{' + escaped.map {|p| p.join(':')}.join(',') + '}' json_data = JSON.parse(valid) 

Whenever you have a fragment like the one above throws an exception, make sure all data is written to the log. When you collect more examples, you can improve their handling.

I am not a Rubyist, but I am sure that you could do something with a rescue block, where you would only need to call the above code using a ruby ​​JSON analyzer.

0
source

All Articles