Java: Assigning Object Link Identifiers for Custom Serialization

For various reasons, I have custom serialization where I dump some fairly simple objects into a data file. There are perhaps 5-10 classes, and the resulting graphs of objects are acyclic and fairly simple (each serialized object has 1 or 2 links to another that is serialized). For example:

class Foo { final private long id; public Foo(long id, /* other stuff */) { ... } } class Bar { final private long id; final private Foo foo; public Bar(long id, Foo foo, /* other stuff */) { ... } } class Baz { final private long id; final private List<Bar> barList; public Baz(long id, List<Bar> barList, /* other stuff */) { ... } } 

The id field is intended only for serialization, so when I serialize to a file, I can write objects, keeping a record of which identifiers have been serialized so far, and then for each object checking whether its children were serialized and writing those who did not, after all, by writing the object itself, writing its data fields and identifiers corresponding to its child objects.

What puzzles me is how to assign an id. I thought about this, and it seems that there are three cases for assigning an identifier:

  • dynamically created objects - an identifier is assigned by a counter, which increases
  • Reading objects from disk - id is assigned from the number stored in the disk file
  • singleton objects - an object is created before any dynamically created object to represent the singleton object that is always present.

How can I handle them correctly? I feel like I am reinventing the wheel, and there must be an established method of handling all cases.


clarification: as tangential information, the file format I'm looking at is something like the following (filling in a few details that should not be relevant). It is optimized for processing a fairly large amount of dense binary data (tens / hundreds of MB) with the ability to intersect structured data in it. Dense binary data makes up 99.9% of the file size.

The file consists of a series of error correction blocks that serve as containers. Each block can be considered as containing a byte array, which consists of a series of packets. You can read packets one at a time (for example, you can specify where the end of each packet ends, and the next one right after that).

Thus, a file can be considered as a series of packages stored on top of a layer with error correction. The vast majority of these packages are opaque binary data that has nothing to do with this issue. However, a small minority of these packages are elements containing serialized structured data, forming a kind of “archipelago” consisting of data from “islands” that can be connected by object-oriented relationships.

So, I can have a file in which packet 2971 contains serialized Foo, and packet 12083 contains a serialized panel that refers to Foo in packet 2971. (with packets 0-2970 and 2972-12082, which are opaque data packets)

All these packages are immutable (and, therefore, taking into account the limitations of building Java objects, they form an acyclic graph of objects), so I do not need to solve problems with variability. They are also descendants of the common Item interface. I would like to write an arbitrary Item object for a file. If the Item contains links to other Item that are already in the file, I also need to write them to the file, but only if they are not already written. Otherwise, I will have duplicates that I will need to somehow unite when I read them.

+6
java serialization
source share
3 answers

Do you really need to do this? Inside tracks are ObjectOutputStream , whose objects are already serialized. Subsequent records of the same object retain only the internal link (similar to the record of only the identifier), and not to re-record the entire object.

See Serialization Cache for more information.

If the identifiers correspond to some externally defined identifier, for example, the identifier of an object, then this makes sense. But the question is that identifiers are generated solely to keep track of which objects are being serialized.

You can handle single points using the readResolve method. A simple approach is to compare the newly-processed deserialized instance with your singleton instances, and if there is a match, return the singleton instance, not the deserialized instance. For example.

  private Object readResolve() { return (this.equals(SINGLETON)) ? SINGLETON : this; // or simply // return SINGLETON; } 

EDIT: In response to comments, a stream is basically binary data (stored in an optimized format) with complex objects that are not taken into account in this data. This can be done using a stream format that supports sub-streams, for example. zip or simple block block. For example. a stream may be a sequence of blocks:

 offset 0 - block type offset 4 - block length N offset 8 - N bytes of data ... offset N+8 start of next block 

Then you can have blocks for binary data, blocks for serialized data, blocks for serialized XStream data, etc. Since each block knows its size, you can create a subflow for reading to this length from a location in the file. This allows you to freely move data without problems for parsing.

To implement a thread, ask your main thread to analyze the blocks, for example.

  DataInputStream main = new DataInputStream(input); int blockType = main.readInt(); int blockLength = main.readInt(); // next N bytes are the data LimitInputStream data = new LimitInputStream(main, blockLength); if (blockType==BINARY) { handleBinaryBlock(new DataInputStream(data)); } else if (blockType==OBJECTSTREAM) { deserialize(new ObjectInputStream(data)); } else ... 

A sketch of a LimitInputStream as follows:

 public class LimitInputStream extends FilterInputStream { private int bytesRead; private int limit; /** Reads up to limit bytes from in */ public LimitInputStream(InputStream in, int limit) { super(in); this.limit = limit; } public int read(byte[] data, int offs, int len) throws IOException { if (len==0) return 0; // read() contract mandates this if (bytesRead==limit) return -1; int toRead = Math.min(limit-bytesRead, len); int actuallyRead = super.read(data, offs, toRead); if (actuallyRead==-1) throw new UnexpectedEOFException(); bytesRead += actuallyRead; return actuallyRead; } // similarly for the other read() methods // don't propagate to underlying stream public void close() { } } 
+4
source share

Are foos registered with FooRegistry? You can try this approach (suppose Bar and Baz also have registries for getting links via id).

It probably has a lot of syntax errors, usage errors, etc. But I think the approach is good.

public class Foo {

 public Foo(...) { //construct this.id = FooRegistry.register(this); } public Foo(long id, ...) { //construct this.id = id; FooRegistry.register(this,id); } 

}

public class FooRegistry () {Map foos = new HashMap ...

 long register(Foo foo) { while(foos.get(currentFooCount) == null) currentFooCount++; foos.add(currentFooCount,foo); return currentFooCount; } void register(Foo foo, long id) { if(foo.get(id) == null) throw new Exc ... // invalid foos.add(foo,id); } 

}

public class Bar () {

 void writeToStream(OutputStream out) { out.print("<BAR><id>" + id + "</id><foo>" + foo.getId() + "</foo></BAR>"); } 

}

open class Baz () {

 void.writeToStream(OutputStream out) { out.print("<BAZ><id>" + id + "</id>"); for(Bar bar : barList) out.println("<bar>" + bar.getId() + </bar>"); out.print("</BAZ>"); } 

}

+1
source share

It seems to me that I am inventing a wheel, and there should be an established method of handling all cases.

Yes, it seems that serializing objects by default will do, or, ultimately, you will pre-optimize.

You can change the format of serialized data (for example, XMLEncoder ) for more convenient.

But , if you insist, I think that a singleton with a dynamic counter should do, but not put the identifier in the public interface for the constructor:

 class Foo { private final int id; public Foo( int id, /*other*/ ) { // drop the int id } } 

Thus, a class can be a "sequence", and it would probably be more appropriate for a long time to avoid problems with Integer.MAX_VALUE .

Using AtomicLong , as described in the java.util.concurrent.atomic package (to avoid two threads, assign the same identifier, or to avoid excessive synchronization) would also help.

 class Sequencer { private static AtomicLong sequenceNumber = new AtomicLong(0); public static long next() { return sequenceNumber.getAndIncrement(); } } 

Now in every class you have

  class Foo { private final long id; public Foo( String name, String data, etc ) { this.id = Sequencer.next(); } } 

What is it.

(note, I don’t remember if deserializing the object calls the constructor, but you get the idea)

+1
source share

All Articles