Untrusted, low bandwidth Java ORM strategies

Question

Untrusted, low bandwidth Java ORM strategies

I am considering Hibernate for a system that should run on an untrusted network. There is a single central database for which we need read and write access, but it is available on a rather heterogeneous wi-fi network. In addition, there may be power losses that do not stop the application, so any solution should have a permanent cache that can withstand power cycles. Finally, it is an embedded system with only modest memory and disk space, so for example, full database replication is not a possible strategy.

I have a basic understanding of Hibernate Level 2 caching, and I am wondering if it is possible to configure this with something like Ehcache to solve this problem, but the main problem is that it is performance and not accessibility, so I am not aware that can be pitfalls.

I also quite agree to consider other strategies that include replication in the local database. I would rather not do too much hard work to realize this.

Look for some experience or possible alternatives.

+7

java caching hibernate ehcache

Dean povey Apr 30 '11 at 10:54

source share

6 answers

"In addition, there may be power losses that do not stop the application, so any solution should have a permanent cache that can withstand power cycles."

You already have a Hibernate Layer 2 cache solution. But you did not say what the real requirements are. You have an unrealized network. This is normal; you have an unrealized power supply. It's also good. What level of service do you want to achieve? What is acceptable or not?

Is data loss possible? How much could you take? What risk do you take?

To be more explicit, let's say you have a local replica of the database, or at least part of it. Let's say you know how to make changes to the queue / save made locally. Say you save a thesis modification on your hard drive to be safe in the event of a power failure. Let's say you can merge the changes with the main database when the connection is again available.

This is already a lot of assumptions. OK, but what happens if one hard drive fails after a power failure? Do you know that a hard drive does not like a power failure and may be damaged by a power failure or may even be damaged?

So, you put on RAID and add uninterruptible power supply. It's good. You detect an OS power failure event. Complete the current transaction and exit correctly. You RAID protect you from disk failure.

OK, but what happens if the whole computer stops functioning? What happens in case of fire? Or water damage? All disks will be managed, data cannot be restored and is not synchronized with the central database. Is this acceptable or not?

Even when Wi-Fi is turned on, the power supply works fine ... What is the reliability of the central database? Do you have regular backups? Or a cluster solution? Are you sure your central database is reliable anyway?

From a database perspective, it is easy to use a cluster or backup and use transactions to ensure data consistency. You can still lose data (if you do not use the cluster in particular), but you can restore it to the last backup.

But if you want to work offline (with an inaccessible database), and you are not the only one who can change the database, there will be conflicts. This is no longer a cache, sleep mode, or something technical.

This is a functional issue. What if several changes happen offline and you need to merge? What is acceptable? What is not. Perhaps this is due to the fact that when you reconnect, the latest changes are applied, old changes are discarded. Ptential conflicts are detected and it is suggested that the user handle them. You can try to apply the change in the queue and apply all of them ...

I would like to think that you can offer "offline mode", but your users should know that they are offline, and should be notified when changes become permanent in the central database with possible conflict resolution. But this is my point of view.

+3

Nicolas bousquet May 09 '11 at 12:20

source share

You cannot expect success with such a network between sleep and database.

I recommend that you define a set of high-level atomic operations, and then define a set (for example) of services for them. Or, if you want, you can use soap and peek into the WS- * reliable messaging settings to take care of retries and all other messy details.

Or you could investigate if something like cassandra by reference would work better than SQL, or something even more in replication.

+2

bmargulies Apr 30 '11 at 23:18

source share

How about a queue in db operations in a long / constant message queue, and let some middleware messaging software deal with a network problem?

Depending on how you do this, consistency issues (well, “anomaly” is the right word that I think) may arise, but if you have an unreliable network and you still want decent performance, then setting up for a relaxed sequences can be the way to go.

I would not dare to use EhCache, etc. They were not designed for this, and so you may have to “stretch” the structure. Message queues, on the other hand, have solutions designed for such scenarios.

+2

Enno shioji May 01 '11 at 1:23

source share

If this were a case of sporadic connection between two machines, I would recommend that you save a transaction log that can be played back and every record that is marked as processed. However, limited memory can make this difficult.

Perhaps you can save a compressed transaction log.

+1

dj_segfault May 01 '11 at 12:34

source share

Hibernate (and the second level cache) are really not designed for this. I assume that you would probably be best off using a small built-in Java RDBMS (e.g. H2 or HSQLDB) as your local time queue (in the most durable mode), and then synchronize with the background thread. You can then provide a synchronization synchronization interface connected to this background thread to provide some degree of feedback for the user.

By the way, Hibernate is a little thick to dump in the embedded environment. You might want to consider myBatis.

+1

Will iverson May 11 '11 at 17:11

source share

Dean povey · Accepted Answer · 2011-05-15T17:00:06+0000

The Daffodil Replicator (http://enterprise.replicator.daffodilsw.com/index.html) allows you to replicate between JDBC sources. It supports bidirectional updates, merging and conflict resolution, and partial replicas.

This can be used to synchronize the main database with a local (partial) replica. You can use hibernate to talk to the local replica database and do the rest outside this process.

Untrusted, low bandwidth Java ORM strategies

More articles: