Common method for parsing a socket-based application protocol

What is the usual way to analyze the application protocol?

Given the stream from the socket of an already developed protocol (for example, SMTP), what is the usual way to handle the protocol. Is this a yacc-based parser, a regex-based approach, or another way?

+4
source share
1 answer

There are many application level protocols, but I think the main difference is that binary or text is based on . Both are used widely.

For a text protocol , input tokenization is pretty common, and then parsing it with something like yacc . Some text protocols are even easier to parse than this, so you can just split the input and see if it makes sense. Coding should be considered, but it should be that you already have subroutines in your language using built-in methods or a library, for example. UTF-8 . HTTP, for example, is a text protocol and is pretty easy to parse (example from here ):

Request:

 GET /path/file.html HTTP/1.0 From: someuser@jmarshall.com User-Agent: HTTPTool/1.0 [blank line here] 

Answer:

 HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 <html> <body> <h1>Happy New Millennium!</h1> (more file contents) . . . </body> </html> 

Most programmers can write a parser for this, even if you are better off relying on a well-tested and complete library implementation.

Binary protocols are slightly different. First of all, it is necessary to encode / decode a message using, for example, ASN.1 (used very often in telecommunications), protocol buffers, or something like that. If possible, do not come up with your own binary format, rely on tried and tested libraries - it's hard to understand, it is not surprising that, for example, for ASN.1 most tools are expensive.

This is ASN.1 UPER , you define a simple element, for example. (example from here ):

 myQuestion FooQuestion ::= { trackingNumber 5, question "Anybody there?" } 

and it is encoded like this:

 01 05 0e 83 bb ce 2d f9 3c a0 e9 a3 2f 2c af c0 

Using all the bit shifts and masking is not very easy to implement, so ASN.1 open source libraries with PER support are so rare.

Both approaches have their advantages / disadvantages. Text protocols are somewhat easier to obtain, debug, and understand. However, they are usually quite talkative, and in certain circumstances this is of great importance. This is when they choose, for example. ASN.1 PER , which is very difficult to implement or debug, but very compact.

+2
source

All Articles