So the problem is that you have Java code snippets interspersed with html, so no standard metrics tool will work.
Not quite off the shelf, but our Search Engine Source Code may come close. It is a tool for finding large codebases by indexing source code using precise lexical extrapolation. The relevance here is that it calculates SLOC, comment count, Halstead and Cyclomatic measurements of the files it indexes, so you get indicators if you simply ignore the search function. Metrics are generated in an XML file (with one "record" per source file), so you can do whatever further processing you want on them. See Discussion of metrics on a linked web page.
As long as we have the JSP lexer, it has not yet been tested with a search engine. We have built dozens of lexers, so it will be very easy for us (and we will be happy about it). This will give a direct answer.
If you don’t want to go this route, you can fulfill your simple idea of extracting the code between <% and%>, upload it to files parallel to the original JSP files, and pass this code to the search engine through its (production) Java lexeme extractor for search engine and get your indicators this way. Lexers are very reliable in the fact of corrupted files, so the fact that the extracted Java fragments may not be entirely legal will not bother him a bit.
Ira Baxter
source share