How do languages ​​like C # and Java avoid C / C ++ - as a standalone compilation?

For my class of programming languages, I am writing a research article about some of the work of some important people in the history of language design. One of CAR Hoare seemed strange to me because it speaks against the independent compilation methods used in C, and later C ++, before C became popular.

Since this is primarily an optimization to speed up compilation time, what is it about Java and C # that allow them to avoid dependency on independent compilation? Is this a compiler technique or are there language elements that make this easier? Are there any other compiled languages ​​that used these methods before them?

+6
java compiler-construction c # programming-languages
source share
6 answers

Short answer: Java and C # do not avoid separate compilation; they make full use of it.

Where they differ from each other, they do not require the programmer to write a pair of separate header / implementation files when writing reusable libraries. The user writes the class definition once, and the compiler extracts information equivalent to the "header" from this single definition and includes it in the output file as "type metadata". Thus, the output file (a .jar , full of .class files in Java or a .dll assembly in .NET languages) is a combination of binary files and headers in one package.

Then, when another class is compiled and depends on the first class, it can look at the metadata instead of finding a separate include file.

It happens that they are aimed at a virtual machine, and not at a specific architecture of the chip, but this is a separate problem; they could put the x86 machine code as a binary file and still have header-like metadata in the same file (this is actually an option in .NET, although rarely used).

In C ++ compilers, they usually try to speed up compilation using "precompiled headers." The metadata in the .NET .dll and .class files is much like a pre-compiled header - it has already been analyzed and indexed, ready for quick search queries.

The result is that in these modern languages ​​there is one way to perform modulation, and it has the characteristics of a beautifully organized and manually optimized C ++ modular assembly system - quite elegant, saying ASFAC ++ B.

+5
source share

IMO, one of the biggest factors here is that both Java and .NET use intermediate languages; this means that the compiled block (jar / assembly) contains as a prerequisite a lot of expressive metadata about types, methods, etc .; this means that it is already conveniently set for verification. In any case, the runtime still checks if you pull fast; -p

This is not very far from the MIDL that COM supports, although TLB is often a separate entity.

If I misunderstood your meaning, please let me know ...

+4
source share

You can think of the java.class file as similar to a precompiled header file in C / C ++. In fact, a .class file is an intermediate form in which you need a C / C ++ linker, as well as all the information contained in the header (Java simply does not have a separate header).

Create your comment in another post:

“Basically, I mean the idea in C / C ++ that each source file is its own individual compilation unit. This one seems to be wrong with C # or Java.”

In Java (I cannot speak for C #, but I assume that it is the same), each source file is its own separate compilation unit. I'm not sure why you think this is not ... maybe we have different definitions of the compilation unit?

+3
source share

This requires some language support (otherwise C / C ++ compilers would also do this)

In particular, this requires the compiler to generate standalone modules that display metadata that can reference other modules to be called in them.

.NET assemblies are a simple example. All files in the project are compiled together, generating one DLL. This DLL can be requested by .NET to determine what types it contains so that other assemblies can call the functions defined in it.

And to use this, it must be legal in this language to refer to other modules.

In C ++, what determines the border of a module? The language indicates that the compiler only considers data in its current compilation unit (.cpp file + included headers). There is no mechanism for indicating "I would like to call the Foo function in the module pane, although I don't have a prototype or anything else for it at compile time." The only mechanism you use to exchange type information between files is C # includes.

There is a proposal to add a modular system in C ++, but it will not be in C ++ 0x. The last thing I saw was planning to consider it for TR1 after 0x did not come out.

(It is worth mentioning that the #include system in C / C ++ was originally used because it accelerated compilation. Back in the 70s, it allowed the compiler to process code in a simple linear scan. They should create syntax trees or other such "advanced" functions Today tables have evolved, and this has become a huge bottleneck, both in terms of usability and compilation speed.)

+2
source share

Object files generated by C / C ++ should only be read by the linker, not the compiler.

+2
source share

As for other languages: IIRC Turbo Pascal had “units” that you could use without any source code. I think that the point is to create metadata together with compiled code, which can then be used by the compiler to determine the interface to the module (for example, function signatures, class layout, etc.).

One problem with C / C ++, which prevents just replacing #include with some #import, is also a preprocessor that can completely change the value / syntax, etc. included / imported modules. It would be very difficult (if not impossible) with a Java-like module system.

+1
source share

All Articles