What you want is the Transformation Program .
Good ones have parsers for the language you are interested in, assembling the AST, representing the program for the analyzed code, will give you access to the AST for analysis and modification, and can restore the source text from the AST. Your remark about "field scanning" is just a kind of workaround for the AST representing the program. For each interesting analysis result that you produce, you want to make changes to the AST, perhaps somewhere else, but nonetheless in the ACT. And after you have created all the variables, you want to restore the text with comments (as it was originally entered, or as you built in your new code).
There are several tools that are specifically designed for Java.
Jackpot provides a parser, builds AST, and allows you to code Java procedures to do what you want with trees. Potential: simple conceptually. Downside: You write a lot more Java code to climb trees than you expected. The jackpot only works with Java.
Stratego and TXL analyze your encode, create AST and let you write surce-to-source transformations (using the target language syntax, like Java in this case) to express patterns and corrections. Additional useful news: you can define any programming language that you like as the target language for processing, and both of them have Java definitions. But they are weak in analysis: often you need character plates and data flow analysis to really analyze and modify what you need. And they insist that this whole rewrite rule is whether this helps you or not; it's a bit like you only need a hammer in the toolbox; in the end, everything can be seen as a nail, right?
Our DMS Software Reengineering Toolkit allows you to define an inaccurate target language (and has many predefined langauges, including Java ), includes all the possibilities of converting a source to a Strategyo source, TXL, Jackpot procedural capabilities, and additionally provides symbol tables, control and flow analysis data data. The compiler guys taught us that these things are necessary to create strong compilers (= "analysis + optimization + refinement"), and this is true for code generation systems for exactly the same reasons. Using this approach, you can generate code and optimize it to the extent that you have the knowledge to do so. One example similar to your serialization ideas is the generation of fast XML readers and writers for specific XML DTDs; we did this using DMS for Java and COBOL.
DMS is used to read / modify / write many types of source files. A good example that makes ideas clear can be found in this technical article, which shows how to modify the code for inserting measurement probes: Branch Coverage Made Easy . A simpler but more complete example of the definition of arbitrary lanauges and the transformations applicable to it can be found in How to Transform Algebra using the same ideas.