For various reasons, I have custom serialization where I dump some fairly simple objects into a data file. There are perhaps 5-10 classes, and the resulting graphs of objects are acyclic and fairly simple (each serialized object has 1 or 2 links to another that is serialized). For example:
class Foo { final private long id; public Foo(long id, ) { ... } } class Bar { final private long id; final private Foo foo; public Bar(long id, Foo foo, ) { ... } } class Baz { final private long id; final private List<Bar> barList; public Baz(long id, List<Bar> barList, ) { ... } }
The id field is intended only for serialization, so when I serialize to a file, I can write objects, keeping a record of which identifiers have been serialized so far, and then for each object checking whether its children were serialized and writing those who did not, after all, by writing the object itself, writing its data fields and identifiers corresponding to its child objects.
What puzzles me is how to assign an id. I thought about this, and it seems that there are three cases for assigning an identifier:
- dynamically created objects - an identifier is assigned by a counter, which increases
- Reading objects from disk - id is assigned from the number stored in the disk file
- singleton objects - an object is created before any dynamically created object to represent the singleton object that is always present.
How can I handle them correctly? I feel like I am reinventing the wheel, and there must be an established method of handling all cases.
clarification: as tangential information, the file format I'm looking at is something like the following (filling in a few details that should not be relevant). It is optimized for processing a fairly large amount of dense binary data (tens / hundreds of MB) with the ability to intersect structured data in it. Dense binary data makes up 99.9% of the file size.
The file consists of a series of error correction blocks that serve as containers. Each block can be considered as containing a byte array, which consists of a series of packets. You can read packets one at a time (for example, you can specify where the end of each packet ends, and the next one right after that).
Thus, a file can be considered as a series of packages stored on top of a layer with error correction. The vast majority of these packages are opaque binary data that has nothing to do with this issue. However, a small minority of these packages are elements containing serialized structured data, forming a kind of “archipelago” consisting of data from “islands” that can be connected by object-oriented relationships.
So, I can have a file in which packet 2971 contains serialized Foo, and packet 12083 contains a serialized panel that refers to Foo in packet 2971. (with packets 0-2970 and 2972-12082, which are opaque data packets)
All these packages are immutable (and, therefore, taking into account the limitations of building Java objects, they form an acyclic graph of objects), so I do not need to solve problems with variability. They are also descendants of the common Item interface. I would like to write an arbitrary Item object for a file. If the Item contains links to other Item that are already in the file, I also need to write them to the file, but only if they are not already written. Otherwise, I will have duplicates that I will need to somehow unite when I read them.