I process HTTP data directly from packets (either TCP is reconstructed or not, you can assume that it is).
I am looking for a better way to parse HTTP.
The main issue here is the HTTP header.
Looking at the basic RFC HTTP / 1.1 , it looks like parsing HTTP headers will be tricky. The RFC describes very complex regular expressions for different parts of the header.
Should I write these regular expressions to parse different parts of the HTTP header?
The basic parsing that I have written so far for the HTTP header, for the general HTTP header:
message-header = field-name ":" [ field-value ]
And I turned on the replacement of the internal LWS with SP and repeated the headers with the same field-name with commas, as described in section 4.2.
However, looking at section 14.9, for example, it will be shown that in order to parse the different parts of the field-value , I need a much more complex parsing scheme.
How do you suggest that I handle the complex parts of HTTP analysis (in particular, field-value ), assuming that I want to give the parser users full HTTP capabilities and parse every part of HTTP?
Design suggestions for this will also be appreciated.
Thanks.
source share