C structure debugging replication in Java

Question

C structure debugging replication in Java

According to here , the C compiler deletes the values when writing the structure to a binary file. As the example in the link shows, when writing such a structure:

struct { char c; int i; } a;

into the binary, the compiler usually leaves an unnamed unused hole between the char and int fields to make sure the int field is correctly aligned.

How can I create an exact copy of a binary output file (generated in C) using a different language (in my case, Java)?

Is there an automatic way to apply scrolling to Java output? Or I need to go through the compiler documentation to see how it works (the compiler is g ++ by the way).

+4

java c compiler-construction padding

Lehane May 08 '09 at 11:27

source share

11 answers

Do not do this, it is fragile and will lead to alignment and statement errors.

For external data, it is much better to explicitly define the format in terms of bytes and write explicit functions to convert between the internal and external format using shift and mask (not union!).

+14

starblue May 08 '09 at 11:37

source share

Is there an automatic way to apply C padding to Java output? Or I have to read the compiler documentation to see how it works (g ++ compiler by the way).

None. Instead, you explicitly specify the data / connection format and implement this specification, rather than relying on implementation details of the C compiler. You will not even get the same result from different C compilers.

+5

Michael borgwardt May 08 '09 at 12:11

source share

For compatibility, view the ByteBuffer class.

Essentially, you create a buffer of a certain size, put () variables of different types in different positions, and then call an array () at the end to get a "raw" representation of the data:

 ByteBuffer bb = ByteBuffer.allocate(8); bb.order(ByteOrder.LITTLE_ENDIAN); bb.put(0, someChar); bb.put(4, someInteger); byte[] rawBytes = bb.array();

But you decide where to put the pad - that is, how many bytes to skip between positions.

To read data written in C, you usually wrap () ByteBuffer around some byte array that you read from the file.

In case this is useful, I wrote more on ByteBuffer .

+4

Neil coffey May 08 '09 at 12:13

source share

A convenient way to read / write C structures in Java is to use the Javolution Struct class (see http://www.javolution.org ). This does not help you automatically add / align your data, but makes working with raw data stored in ByteBuffer much more convenient. If you are not familiar with javolution, you should see how many other interesting things there are.

+2

bm212 May 08, '09 at 19:03

source share

This hole is customizable, the compiler has switches for aligning structures to 1/2/4/8 bytes.

So, the first question: exactly what alignment do you want to simulate?

+1

alamar May 08 '09 at 11:30

source share

In Java, the size of the data is determined by the language specification. For example, type byte is 1 byte, short is 2 bytes, etc. This is not like C, where the size of each type is architecture dependent.

Therefore, it is important to know how the binary is formatted in order to read the file in Java.

It may be necessary to take steps to ensure that the fields are of a specific size in order to take into account differences in the compiler or architecture. The mention of alignment seems to suggest that the output file will be architecture dependent.

+1

coobird May 08 '09 at 11:40

source share

you can try preon :

Preon is a java library for creating codecs for data compressed in a bitstream in a declarative (annotation-based) way. Think JAXB or Hibernate, but then for binary encoded data.

it can handle Big / Little endian binary data, padding, and various numeric types along with other functions. This is a very good library, I really like it

my 0.02 $

+1

dfa May 08 '09 at 12:52

source share

I highly recommend protocol buffers for this particular problem.

+1

Adam rosenfield May 08 '09 at 19:09

source share

As I understand it, you say that you do not control the output of the program in C. You must accept this as indicated.

So, do you need to read this file for a specific set of structures, or do you need to solve this in the general case? I mean, the problem is that someone said: "Here is the file created by program X, you must read it in Java"? Or do they expect your Java program to read the C source code, find the structure definition, and then read it in Java?

If you have a specific file to read, the problem is not very complicated. Either by looking at the specifications of the C compiler or by studying the sample files, find out where the gasket is. Then, on the Java side, read the file as a stream of bytes and build the values you know about. Basically, I would write a set of functions for reading the required number of bytes from an InputStream and turning them into the corresponding data type. How:

 int readInt(InputStream is,int len) throws PrematureEndOfDataException { int n=0; while (len-->0) { int i=is.read(); if (i==-1) throw new PrematureEndOfDataException(); byte b=(byte) i; n=(n<<8)+b; } return n; }

0

Jay May 08, '09 at 17:29

source share

You can change the packaging on the c side to ensure that no padding is used, or alternatively you can look at the resulting file format in a hex editor so you can write a parser in Java that ignores bytes that are padding.

-1

PaulJWilliams May 08 '09 at 11:31

source share

unwind · Accepted Answer · 2009-05-08T11:31:05+0000

This is true not only when writing to files, but also in memory. It is a fact that the structure is filled in memory, which leads to the appearance of an addition in the file if the structure is written by byte.

In general, it is very difficult to accurately reproduce the exact filling pattern, although, I think, some heuristics will take you quite far. This helps if you have a structure declaration for analysis.

Typically, fields greater than one char will be aligned so that their initial offset within the structure is a multiple of their size. This means that short will usually be at uniform offsets (divided by 2, assuming sizeof (short) == 2 ), and double will be at offsets divisible by 8, etc.

UPDATE . This is due to such reasons (as well as for reasons related to content) that it is usually a bad idea to dump entire structures into files. It is better to do this across the field, for example:

 put_char(out, ac); put_int(out, ai);

Assuming that the put functions only write the bytes needed for the value, this will release the struct version to the file without problems to solve the problem. It is also possible to provide correct, well-known byte ordering by writing these functions accordingly.

C structure debugging replication in Java

More articles: