The key difference between a "compiler-parser" and a "reengineered parser" is related to what information is captured regarding layout, comments, and literal formats. Like other observers, most compilers discard all this information, since it is not related to compiling to low-level code.
Similarly, classic parser generators (such as JavaCC, ANTLR, etc.) offer very little support in writing / regenerating this information.
Reengineering parsers, by contrast, are used to analyze code and comments, sometimes even to review code without loss (or to properly review comments). To analyze code with comments, you cannot reject comments: -} To modify the code, if you regenerate the modified code based on the original, it is nice if the modified code saves the code layout, comments and format literals (for example, the hexadecimal literal case as a decimal value is legal and equivalent, but makes the original authors rather unhappy). To do this, reengineering analyzers need special lexers to capture all this data and analyze machines that don't throw it away.
Our DMS software reengineering toolkit includes, well, reengineering parser as a universal equipment; DMS parsers exist for a wide range of languages (including OP's interest in Java). DMS captures all information about comments / layout / formatting. Analysis tools have access to all of this.
TXL and Stratego also support this feature.
source share