How to write the main JSON parsing class

Can anyone advise writing a class that will take JSON data and try to parse it into a simple buffer list from which we could read the data?

Ex. Json

{ name: 'John', age: 56 } 

.. will be sorted into a table of value pairs

 name John age 56 

How to write an analysis method that helps to create faster and easier?

Please do not offer any existing library. Provide a concept for parsing JSON.

+11
java json parsing
source share
5 answers

This answer assumes that you really want to write a parser and are ready to make the necessary effort.

You should start with the formal JSON specification. I found http://www.ietf.org/rfc/rfc4627.txt . It precisely defines the language. You MUST implement everything in the specification and write tests for this. Your parser MUST serve the wrong JSON (like yours) and throw exceptions.

If you want to write a parser, stop, think, and then do not. It is a lot of work to make it work correctly. No matter what you do, do the right thing β€” incomplete parsers are a threat and should never be spread.

You MUST write code that matches. Here are some phrases from the specification. If you do not understand them, you will have to carefully study and make sure that you understand:

"JSON text will be encoded in Unicode. The default encoding is UTF-8."

"The JSON parser MUST accept all texts matching the JSON grammar."

"Encoding: 8 bits if UTF-8; binary if UTF-16 or UTF-32

  JSON may be represented using UTF-8, UTF-16, or UTF-32. When JSON is written in UTF-8, JSON is 8bit compatible. When JSON is written in UTF-16 or UTF-32, the binary content-transfer-encoding must be used. 

"

"Any character can be escaped. If the character is in the Basic Multilingual Plane (U + 0000 via U + FFFF), then it can be represented as a six-character sequence: the inverse solidus, followed by the lowercase u, followed by four hexadecimal digits, which encode the code point of the character. Hexadecimal letters A, although F may be upper or lower case. For example, a string containing only one inverse solidus character may be represented as "\ U005C."

If you understand this and still want to write a parser, check out some other parsers and see if they have any conformance tests. Borrow them for your application.

If you are still interested, you should seriously consider using a parser generator. Examples are JAVACC, CUP and my preferred ANTLR tool. ANTLR is very powerful, but it can be difficult to get started with. See also Parboiled suggestion, which I would recommend. JSON is relatively simple, and that would be a useful exercise. Most parser generators generate a complete parser that can generate executable code or generate a parsing tree for your JSON.

There is a JSON parser using ANTLR at http://www.antlr.org/wiki/display/ANTLR3/JSON+Interpreter if you are allowed to look into it. I also just discovered Parboiled parser-generator for JSON . If your main reason for writing a parser is to learn how to do this, this is probably a good starting point.

If you are not allowed (or not required) to use a parser, you will have to create your own parser. This usually happens in two parts:

lexer / tokenizer . This recognizes the basic primitives defined in the language specification. In this case, it will have to recognize curly braces, quotation marks, etc. Perhaps this will also create a representation of numbers.

the AbstractSyntaxTree ( http://en.wikipedia.org/wiki/Abstract_syntax_tree , AST) generator . Here you write code to build a tree representing an abstraction of your JSON (for example, spaces and curls are discarded).

When you have AST, you need to easily sort through the nodes and create the desired result.

But writing parsers, even for a simple language like JSON, is a lot of work.

+23
source share

If your "JSON" is really like that, you must first take a baseball bat and hit its producer. Jokes aside.

If you really insist on writing your own class (why?), You can, for example, use this interface:

 public interface MyParser { boolean parse() throws MyParsingException; MyParser next(); } 

Implementations will then accept the CharBuffer as an argument and the map builder class; and for parsing you would do:

 final CharBuffer buf = CharBuffer.wrap(yourSource); final MyMapBuilder builder = new MyMapBuilder(); MyParser parser = new OpenBracketParser(buf, builder); while (parser.parse()) parser = parser.next(); // result is builer.build() 

This is just one example ...

The second solution, you want to use an existing parsing tool; in this case take a look at Parboiled . Much easier to use than antlr, jflex or others, since you write your grammar in pure Java.

Finally, if you decide that is enough and decide to use the JSON library (you really should do this), use Jackson , which can read even such distorted JSON:

 public static void main(final String... args) throws IOException { final ObjectMapper mapper = new ObjectMapper() .configure(JsonParser.Feature.ALLOW_UNQUOTED_FIELD_NAMES, true) .configure(JsonParser.Feature.ALLOW_SINGLE_QUOTES, true); final JsonNode node = mapper.readTree("{name: 'John'}"); System.out.println(node); // {"name":"John"} } 
+7
source share

I wrote one before. Steps:

  1. Take the string representing the JSON text.

  2. Create a JsonToken class. I call my JToken.

  3. Go through all the text from step # 1 and parse the JToken (s).

  4. Group and nest your JToken (s) recursively.

  5. Trying to keep it simple and consistent. All JToken nodes have a child array that can have 0 or more children. If the node is an array, mark it as an array. A child array is used for children of the node if is is OBJECT or ARRAY. The only thing that changes is that it is marked as. Also save all values ​​as a string type. Thus, you only need one element on the node named "value", which can be interpreted as the correct data type after all the hard work is done.

  6. Use security coding and unit tests. Write tests for all parser components. It’s better to spend an extra 3 hours writing the code in a paranoid way when you assume you are making mistakes every second than spending 3 hours looking for errors. The code is quite paranoid, and you will very rarely spend time disappointing with debugging.

Code example: when I made an easy (ironically) call on code-eval.com. There was a call to parse the JSON menu. I thought it would be a hoax to use any built-in functions, because for me the whole point of code problems is to test your ability to solve algorithm problems. The task here: https://www.codeeval.com/open_challenges/102/

My code that goes through this task using a parser created from scratch in JavaScript:

 CODE: https://pastebin.com/BReK9iij Was not able to post it on stack-overflow because it is too much code. Put it in a non-expiring paste-bin post. 

Note: this code may use some improvements. Some of them are very inefficient and will not work with Unicode.

I would not recommend writing your own JSON parser if you do not interpret JSON in some non-standard way.

For example: I am currently using JSONedit to organize branches for a text adventure. I use only the JSON file format because it is compact and the viewer allows me to expand and shrink elements. The standard parser that comes with GOLang does not interpret the information the way I want it to be interpreted, so I write my own parser.

+5
source share
 public abstract class AbstractMessageObject { public String toString() { Gson gson = new Gson(); //here mapping Object class name is prefixed to json message so otherwise knows the mapping object return "^" + this.getClass().getName() + "^" + gson.toJson(this); } } 

You can create any bean by extending this AbstractMessageObject . Whenever you want to parse this object in json, you only need to call the toString method

0
source share

I wrote a simple parser on Kotlin, it is not complete. However, it can work as a starting point for your own implementation.

Thanks to @ peter.murray.rust for the ideas provided, and also inspired by the h2database parser .

It basically reads JSON string characters (using StringReader) and tries to parse them based on expected JSON tokens. It implements the function of reading tokens, unread characters and, ultimately, converting the result to AST.

You can find the code and test cases on github.

0
source share

All Articles