How does Java store primitive types in RAM?


It is NOT about whether primitives get on the stack or heap, about where they are stored in actual physical memory.


Take a simple example:

int a = 5; 

I know that 5 is stored in a memory block.

My area of ​​interest is where is the variable 'a' stored?

Related Questions: Where happens when 'a' is associated with a block of memory that contains a primitive value of 5? Is there any other memory block created to hold 'a'? But it will look as if a is a pointer to an object, but a primitive type is used here.

+7
java primitive-types
source share
2 answers

Fetch on Do Java primitives stack or heap? -

Suppose you have a function foo() :

 void foo() { int a = 5; system.out.println(a); } 

Then, when the compiler compiles this function, it will create bytecode instructions that leave 4 bytes of room on the stack when this function is called. The name "a" is only useful for you in the compiler, it just creates a place for it, remembers where it is, and wherever it wants to use the value "a", instead of inserting links into the memory, it is reserved for this value.

If you do not know how the stack works, it works as follows: each program has at least one thread, and each thread has exactly one stack. The stack is a contiguous block of memory (which can also grow if necessary). The stack is initially empty until the first function in your program is called. Then, when your function is called, your function allocates a place for the stack for itself, for all its local variables, for return types, etc.

When your main function calls another foo function, here is one example of what might happen (there is a couple here that simplifies the white lie):

  • main wants to pass parameters to foo . It pushes these values ​​to the top of the stack so that foo will know exactly where they will be placed ( main and foo will pass parameters in a consistent way).
  • main pops the address where program execution should return after foo completes. This increases the stack pointer.
  • main calls foo .
  • When foo starts, it sees that the stack is currently located at address X
  • foo wants to allocate 3 int variables on the stack, so it needs 12 bytes.
  • foo will use X + 0 for the first int, X + 4 for the second int, X + 8 for the third.
    • The compiler can calculate this at compile time, and the compiler can count on the value of the stack pointer register (ESP on the x86 system), and therefore the assembly code that it writes out does things like "store 0 at ESP + 0", "save 1 to ESP + 4 ", etc.
  • The main parameters pushed onto the stack before calling foo can also be accessed with foo by calculating some offset from the stack pointer.
    • foo knows how many parameters are required (say 3), so he knows that, say, X - 8 is the first, X - 12 is the second, and X - 16 is the third.
  • So now that foo has a place on the stack to do its job, it does that and finishes
  • Before main called foo , main wrote its return address on the stack before incrementing the stack pointer.
  • foo looks for the return address - let's say that the address is stored in ESP - 4 - foo , looks at this place on the stack, finds the return address there and goes to the return address.
  • Now the rest of the code in main continues to work, and we have completed a full round.

Note that every time a function is called, it can do whatever it wants with the memory pointed to by the current stack pointer, and everything after it. Each time a function makes a room on the stack for itself, it increments the stack pointer before calling other functions to make sure everyone knows where they can use the stack for themselves.

I know this explanation blurs the line between x86 and java a bit, but I hope this helps to illustrate how the hardware works.

Now it only covers the "stack". A stack exists for each thread in the program and fixes the state of the chain of function calls between each function executed in this thread. However, a program can have multiple threads, so each thread has its own independent stack.

What happens when two function calls want to deal with the same piece of memory, regardless of which thread they are on or where they are on the stack?

Here is a bunch. Usually (but not always) one program has exactly one heap. A heap is called a heap because, well, it's just a big heap of memory.

To use memory on the heap, you need to call allocation procedures — routines that look for unused space and provide it to you, and routines that let you return the space you allocated but are no longer used. The memory allocator receives large pages of memory from the operating system, and then distributes individual small bits to whatever is needed. He keeps track of what the OS has provided to him, and from this, what she has given to the rest of the program. When a program requests heap memory, it searches for the smallest piece of memory that it has that meets its needs, marks this fragment as allocated and passes it back to the rest of the program. If he no longer has free chunks, he can request more pages of memory from the operating system and allocate from there (up to a certain limit).

In languages ​​like C, these memory allocation routines I mentioned are usually called malloc() to request memory and free() to return it.

Java, on the other hand, does not have explicit memory management, such as C, instead it has a garbage collector - you allocate any memory you want, and then when you are done, you just stop using it. The Java runtime will track what memory you have allocated, and will scan your program to see if you are using all your allocations anymore and automatically free those fragments.

So now that we know that memory is allocated on the heap or the stack, what happens when I create a private variable in a class?

 public class Test { private int balance; ... } 

Where did this memory come from? The answer is a bunch. You have code that creates a new Test object - Test myTest = new Test() . Calling the java new operator causes a new instance of Test to be allocated on the heap. Your variable myTest stores the address for this distribution. balance is just some kind of offset from this address - maybe actually 0.

The answer at the bottom is just accounting.

...

The white lie that I spoke of? Let me turn to a few of them.

  • Java is the first computer model - when compiling your program into bytecode, you compile a fully developed computer architecture that does not have registers or assembly instructions, such as any other common processor - Java, .Net and some others, instead of a virtual machine on a stack-based one (e.g. x86 processors) use a stack-based virtual machine. The reason is that stack-based processors are easier to reason about, and therefore it is easier to create tools that manage this code, which is especially important for creating tools that compile this code for machine code that will actually run on regular processors.

  • The stack pointer for a given thread usually starts at some very high address and then grows, not up, at least on most x86 computers. However, since this machine part is actually not a Java problem to worry about (Java has its own machine model to worry about, its Just In Time compiler task is worried about translating this to your actual processor )

  • I briefly mentioned how parameters are passed between functions, saying things like “parameter A is stored in ESP-8, parameter B is stored in ESP-12,” etc. This is usually called a “calling convention” and there are more than a few. On x86-32, registers are sparse, and so many calling conventions pass all parameters on the stack. This has some trade-offs, in particular, that access to these parameters can mean a trip to ram (although cache can mitigate this). x86-64 has a lot more name registers, which means that the most common calling conventions pass the first few parameters to registers, which seems to increase speed. In addition, since Java JIT is the only guy who generates machine code for the whole process (except for its own calls), he can choose to pass parameters using any convention he wants.

  • I mentioned how when declaring a variable in some function, the memory for this variable comes from the stack - this is not always true, and it really depends on the whims of the runtime environment to decide where to get the memory from. In C # / DotNet, the memory for this variable can come from the heap, if the variable is used as part of the closure, this is called heap promotion. Most languages ​​deal with closure by creating hidden classes. So it often happens that local members of a method involved in closures are overwritten as members of some hidden class, and when this method is called, instead allocate a new instance of this class on the heap and save its address on the stack; and now all references to this initially local variable occur through this link to the heap.

+18
source share

I think I realized that you do not want to ask if the data is stored on the heap or the stack! we have the same puzzle!

The question you asked is strongly related to the programming language and how the operating system deals with the process and variables.

This is very interesting, because when I studied at the university, studying C and C ++, I have the same question as you. after reading some ASM code compiled by GCC , I understand this a bit, let's discuss it, if there is any problem, comment on this and let me know more about it.

In my opinion, the variable name will not be saved, and the value of the variable will be saved, because for the sake of brevity, ASM code does not have a real variable name other than cache name , all the so-called variables are simply a off set from stack or heap .
which, I think, is a hint for my training, since ASM deals with the variable name in this way, another language may have the same strategy.
They just store the off set for the real data storage location.
let's make an example, say, the name of the variable a is placed at @1000 , and the type of this a is an integer, so in the memory address

 addr type value @1000 int 5 

where @ 1000 is the off set where the real data is stored.

as you can see, the data is put into a real off set for this.
In my understanding of the process, the whole variable will be replaced with the “address” of this “variable” at the beginning of the process, which means that while the CPU uses only the “address” that is already allocated in memory.
consider this procedure again: what did you define int a=5; print(a); int a=5; print(a);
after compilation, the program switches to another format (all my imagination):

 stack:0-4 int 5 print stack:0-4 

while in a situation of a process that is being executed in a real way, I think the memory will be like this:

 @2000 4 5 //allocate 4 byte from @2000, and put 5 into it print @2000 4 //read 4 byte from @2000, then print 

Since the process memory is allocated by the processor, @2000 is off set this variable name, which means that name will be replaced only with the memory address, then it will read data 5 from this address and then execute the print command.

Rethink

after completing my letter, it was quite difficult for me to portray other people, we can discuss it if there are any problems or mistakes.

+3
source share

All Articles