Parse each part of the header field of the HTTP header

I process HTTP data directly from packets (either TCP is reconstructed or not, you can assume that it is).

I am looking for a better way to parse HTTP.

The main issue here is the HTTP header.

Looking at the basic RFC HTTP / 1.1 , it looks like parsing HTTP headers will be tricky. The RFC describes very complex regular expressions for different parts of the header.

Should I write these regular expressions to parse different parts of the HTTP header?

The basic parsing that I have written so far for the HTTP header, for the general HTTP header:

message-header = field-name ":" [ field-value ] 

And I turned on the replacement of the internal LWS with SP and repeated the headers with the same field-name with commas, as described in section 4.2.

However, looking at section 14.9, for example, it will be shown that in order to parse the different parts of the field-value , I need a much more complex parsing scheme.

How do you suggest that I handle the complex parts of HTTP analysis (in particular, field-value ), assuming that I want to give the parser users full HTTP capabilities and parse every part of HTTP?

Design suggestions for this will also be appreciated.

Thanks.

+4
source share
2 answers

I would go for the principle of one responsibility. Instead of trying to create a single monolithic parser that knows every detail of every HTTP header known to humans, it's easier. Write a simple extensible parser that is in itself responsible for parsing a field name and associating that name with a raw value. Then use plug-in extensions that are only responsible for parsing one header. When you instantiate your parser, enter a collection of extensions and map each extension to a set of field names that it knows how to parse.

You kill two birds with one stone. Your main parser remains simple and focused. You also get the opportunity to expand your parser without having to fiddle with your guts, which leads to the creation of more reliable code.

+7
source

The System.Net.Http.Headers namespace has many parsers. It is worth a look.

+1
source

Source: https://habr.com/ru/post/1312673/


All Articles