You can do this with Perl:
$ perl -ne 'print if /HEADER TEXT/ .. /FOOTER TEXT/' file.html
To print only text between separators, use
$ perl -000 -lne 'print $1 while /HEADER TEXT(.+?)FOOTER TEXT/sg' file.html
The /s switch matches the regular expression treats the entire line as the string s , which means that the period matches newline characters, and /g means the match as many times as possible.
The above examples assume that you are cranking HTML files on a local drive. If you need to get them first, use get from LWP::Simple :
$ perl -MLWP::Simple -le '$_ = get "http://stackoverflow.com"; print $1 while m!<head>(.+?)</head>!sg'
Please note that parsing HTML with regular expressions as described above does not work in the general case! If you work with a fast and dirty scanner, great, but for the application that you need to be more reliable, use a real parser.
Greg bacon
source share