Java - How to overcome the maximum method size in automatically generated code

I have an unusual requirement: My application automatically generates Java code from a very long script (written in a dynamically typed language). The script is so long that I hit the maximum size of the 65k JVM method .

The script consists only of simple instructions on primitive types (without calling functions other than mathematical ones). It might look like this:

... a = b * c + sin(d) ... if a>10 then e = a * 2 else e = a * abs(b) end ... 

... which translates to:

 ... double a = b * c + Math.sin(d); ... double e; if(a>10){ e = a * 2; }else{ e = a * Math.abs(b); } ... 


My first idea to overcome the method size limit was as follows:
  • Turn all local variables into fields
  • Separate the code every 100 lines (or, if necessary, in the case of an if / else block), in separate methods.

Sort of:

 class AutoGenerated { double a,b,c,d,e,....; void run1(){ ... a = b * c + sin(d); ... run2(); } void run2(){ ... if(a>10){ e = a * 2; }else{ e = a * Math.abs(b); } ... run3(); } ... } 

Do you know any other way that would be more effective? Please note that I need the code to run as fast as possible, as it will execute in long loops. I can not resort to compilation in C, since interoperability is also a problem ...

I would also like to draw attention to libraries that could help me.

+5
source share
4 answers

We use a similar approach in one of the projects, despite its shortcomings mentioned by other people. We call multiple generated methods one start method, as @ Marco13 suggests. We actually calculate (more precisely) the size of the generated bytecode and start a new method only when the limit is reached. Our mathematical formulas, which we translate into Java code, are available as AstTree, and we have a special visitor that takes into account the length of the bytecode for each expression. For such simple programs, it is quite stable in Java versions and in different compilers. Therefore, we do not create methods more than necessary. In our case, simply emitting the bytecode directly is enough, but you can try to do it for your language using ASM or a similar library (thus, of course, ASM will calculate the length of the bytecode for you).

Usually we store data variables in one double[] array (we do not need other types) and pass it as a parameter. Thus, you do not need a huge number of fields (sometimes we have thousands of variables). On the other hand, access to a local array can take up more bytes of bytes compared to accessing a field for an index above 127.

Another problem is the constant pool size. Usually we have many double constants in auto-generated code. If you declare many fields and / or methods, their names also accept persistent pool entries. Thus, you can get a limit on the pool limit in the class. Sometimes we hit it and generate nested classes to overcome this problem.

Other people also suggest setting up JVM options. Use these tips carefully, as they will affect not only this auto-generated class, but also all other classes (I assume that in your case, different code also runs in the same JVM).

+2
source

Converting local variables to fields can actually adversely affect performance if the code is not optimized by JIT (see this question and related questions for more information). But I see that, depending on the variables that this implies, there are unlikely to be other possible options.


There may be additional restrictions on compilation size and method. Peter Lowry mentioned in the comments that "... methods larger than 8 KB are not compiled by default" - I did not know about this, but he usually knows what he is talking about, so you have to dig a little deeper here, In addition, you you may need to look at HotSpot VM Settings to find out what additional restrictions and settings might be relevant to you. I primarily thought that

-XX:MaxInlineSize=35 : The maximum bytecode size for the inline method.

maybe something to keep in mind.

(In fact, calling so many methods with a MaxInlineSize size that nesting all these calls will exceed 65 thousand bytes for the containing method can be a neatly unpleasant test case for reliability and edge random testing of the insertion process ...)


You have outlined a “telescoping” scheme for the methods:

 void run1(){ ... run2(); } void run2(){ ... run3(); } 

This can also lead to problems: assuming that you have 650 of these methods (at best), this will at least lead to a deep stack very much and can actually raise a StackOverflowError - again, depending on some memory options . You may need to increase the size of the stack by setting the corresponding -Xss parameter.


The actual description of the problem was a bit vague and without additional information about the code to be generated (also regarding questions about how many local variables you need, which you might have to turn into instance variables, etc.), I'd suggest the following:

  • Create lots of small methods if possible (considering MaxInlineSize )
  • Try reusing these small methods (if such reuse can be detected by input with reasonable effort)
  • Call these methods sequentially, as in

     void run() { run0(); run1(); ... run2000(); } 

    to avoid stack size issues.


However, if you added additional examples or details, you could probably give more focused advice. This may even be a “complete” example - it’s not necessary to include thousands of lines of code, but showing the actual patterns that appear there.

+1
source

I would have a desire to write an interpreter, or perhaps an embedded compiler. You can even get some speed gains because most of the resulting much lower code base will cache more easily.

0
source
  • Turn all local variables into fields

This will not have the slightest effect. Method size == code size. It has nothing to do with local variables that affect only the call frame size.

  • Separate the code every 100 lines (or, if necessary, in the case of an if / else block), in separate methods.

This is your only choice, except for a completely different implementation strategy.

The problem with code generators is that they generate code.

0
source

All Articles