Why do some languages ​​need boxing and unboxing?

This is not a question of what boxing and unpacking is, but rather , why are languages ​​like Java and C # needed?

I am very familiar with C ++, STL and Boost.

In C ++, I could write something like this very easily,

std::vector<double> dummy; 

I have some experience with Java, but I was very surprised because I had to write something like this,

 ArrayList<Double> dummy = new ArrayList<Double>(); 

My question is, why should it be an object, is it so difficult to technically include primitive types when talking about generics?

+27
java c ++ generics boxing
Jun 24 '09 at 19:19
source share
6 answers

Is it so difficult to technically include primitive types when talking about generics?

In the case of Java, this is due to the way generics work. In Java, generics are a compilation trick that prevents you from placing an Image object in an ArrayList<String> . However, Java generators are implemented with type erasure: information about the generic type is lost at runtime. This was due to compatibility considerations, since generics were added quite late in the life of Java. This means that the execution time of an ArrayList<String> effectively an ArrayList<Object> (or better: just an ArrayList that expects and returns an Object in all its methods), which automatically adds to String when you get the value.

But since int not inferred from Object , you cannot put it in an ArrayList that expects (at run time) Object , and you cannot use Object to int either, This means that the int primitive must be wrapped in a type that inherits from Object , for example Integer .

C #, for example, works differently. Generics in C # are also used at runtime, and boxing with List<int> not required. Boxing in C # only happens when you try to store a value type, such as int , in a variable of a reference type, such as Object . Since int in C # inherits from Object in C #, writing object obj = 2 is perfectly valid, however int will be put into a square that is automatically executed by the compiler (the Integer link type is not displayed to the user or anything else).

+46
Jun 24 '09 at 19:47
source share

Boxing and unpacking is a necessity arising from the fact that languages ​​(for example, C # and Java) implement memory allocation strategies.

Certain types are allocated on the stack and others on the heap. To handle the type of the selected stack as the type allocated by the heap, a box is needed to move the type allocated by the stack to the heap. Unboxing is the reverse process.

Classes associated with the C # stack are called value types (for example, System.Int32 and System.DateTime ), and types allocated in a heap are called reference types (for example, System.Stream and System.String ).

In some cases, it is beneficial to be able to handle a value type, such as a reference type (one example is reflection), but in most cases it is better to avoid boxing and unpacking.

+11
Jun 24 '09 at 19:21
source share

I believe this is also because primitives are not inherited from Object. Suppose you have a method that wants to be able to accept anything at all as a parameter, for example.

 class Printer { public void print(Object o) { ... } } 

You may need to pass a simple primitive value to this method, for example:

 printer.print(5); 

You could do this without boxing / unpacking, because 5 is primitive and not an object. You can overload the printing method for each primitive type to include this functionality, but it’s a pain.

+2
Jun 24 '09 at 19:29
source share

I can only tell you about Java, why it does not support primitive types in generics.

At first, the problem arose that the question of supporting this each time led to a discussion if java should even have primitive types. Which, of course, impeded the discussion of a pressing issue.

Secondly, the main reason not to include this is because they wanted binary backward compatibility, so it would run unmodified on a virtual machine, not knowing about generics. This backward compatibility and migration compatibility issue is also explained by the fact that now the collection API supports generics and remains the same, and there is not (like in C # when they introduced generalizations) a complete new set of universal Collection Collection APIs.

Compatibility was done using ersure (information about type parameters of the generic type was removed at compile time), which is also the reason that you have so many warning throw warnings in java.

You can still add re-created generics, but it's not that simple. Just adding information about the type of adding the runtime instead of deleting it will not work, since it interrupts compatibility with source and binary files (you cannot continue to use raw types, and you cannot call existing compiled code because they do not have the appropriate methods) .

Another approach is the one that C # chose: see above

And automatic auto-boxing / unboxing was not supported for this use case, because auto-boxing costs too much.

Java theory and practice: Generics gotchas

+2
Jun 24 '09 at 20:09
source share

In Java and C # (unlike C ++), everything extends Object, so collection classes, such as ArrayList, can contain Object or any of its descendants (basically anything).

However, for performance reasons, primitives in java or value types in C # have received special status. They are not objects. You cannot do something like (in Java):

  7.toString() 

Even though toString is an object method. To reduce this nod to performance, equivalent objects were created. AutoBoxing removes the template code that should put the primitive in its wrapper class and take it out again, making the code more readable.

The difference between value types and objects in C # is grayer. See here for how they differ.

+1
Jun 24 '09 at 19:29
source share

Each non-string object that does not contain a string stored in the heap contains an 8- or 16-byte header (sizes for 32/64-bit systems), and then the contents of this public and private fields of this object. In the arrays and rows, the header is indicated above, as well as several bytes that determine the length of the array and the size of each element (and, possibly, the number of dimensions, the length of each additional dimension, etc.), followed by all fields of the first element, then all fields second, etc. Given a reference to an object, the system can easily examine the header and determine what type it has.

A reference type storage location contains a four- or eight-byte value that uniquely identifies an object stored in the heap. In existing implementations, this value is a pointer, but it's easier (and semantically equivalent) to think of it as an "object identifier".

Value type storages store the contents of value type fields, but they do not have an associated header. If the code declares a variable of type Int32 , there is no need to store information with that Int32 saying what it is. The fact that this location contains Int32 is effectively stored as part of the program, and therefore it does not need to be stored in the place itself. This represents a big savings if, for example, one has a million objects, each of which has an Int32 field. Each of the objects that have Int32 has a header that identifies the class that can use it. Since one copy of this class of code can work on any of millions of instances, the fact that the field is an Int32 part of the code is much more efficient than storing for each of these fields contains information about what it is.

Boxing is necessary when a request is made to transfer the contents of a data type data store to a code that does not know to expect this particular type of value. Code that expects objects of an unknown type can accept a reference to an object stored on the heap. Since each object stored on the heap has a header that determines what type of object it is, the code can use this header whenever it is necessary to use the object in a way that requires knowing its type.

Note that in .net you can declare what are called generic classes and methods. Each such declaration automatically generates a family of classes or methods that are identical, except for the type of object they expect to act on. If you pass Int32 to the DoSomething<T>(T param) routine, this will automatically generate a version of the routine in which each instance of type T will be effectively replaced with Int32 . This version of the procedure will know that each storage location declared as type T contains Int32 , therefore, as in the case when the procedure was hard-coded to use the Int32 storage location, it will not be needed to store information about the type with these locations.

+1
Jun 09 2018-12-12T00:
source share



All Articles