How to develop a high-level application protocol and data format for synchronizing metadata between devices and a server?

I am looking for guidance on how best to think about developing a high-level application protocol for synchronizing metadata between end-user devices and the server.

My goal: the user can interact with the application data on any device or on the Internet. The purpose of this protocol is to exchange changes made at one endpoint to other endpoints through the server and to ensure that all devices support a consistent image of the application data. If the user makes changes on one device or on the Internet, the protocol will transfer data to the central repository, from where other devices can pull it out.

Some other design thoughts:

  • I call this “metadata synchronization” because the payload will be quite small in the form of object identifiers and small metadata about these identifiers. When client endpoints retrieve new metadata using this protocol, they will retrieve the actual object data from an external source based on this metadata. Retrieving the "real" object data is beyond the scope; I am only talking about metadata synchronization here.
  • Using HTTP for transport and JSON for the payload container. The question basically is how to best design the JSON payload scheme.
  • I want this to be easy to implement and maintain on the Internet and on desktop and mobile devices. The best approach is to be a simple HTTP request / response based on a timer or events without any persistent channels. Also, you don't have to have a PhD to read it, and I want my specification to be on 2 pages, not 200.
  • Authentication and security are not suitable for this issue: suppose that requests are protected and authenticated.
  • The goal is the ultimate consistency of data on devices, this is not quite real time. For example, a user can make changes on a single device during battery life. When re-entering the system, the user will perform the “sync” operation to push local changes and retrieve remote changes.
  • Having said that, the protocol should support both of these modes of operation:
    • Starting from scratch on the device, it should be able to pull out the image from the metadata.
    • on-demand synchronization. When you look at the data on two devices next to each other and make changes, it should be easy to push these changes as short individual messages that the other device can receive in almost real time (provided that it decides to contact the server for synchronization).

As a specific example, you can think of Dropbox (this is not what I'm working on, but it helps to understand the model): on a number of devices, the user can manage files and folders - move them around, create new ones, delete old ones, etc. And in my context, “metadata” is the structure of files and folders, but not the actual contents of the file. And the metadata fields would be similar to the file / folder name and modification time (all devices should see the same change time).

Another example is IMAP. I have not read the protocol, but my goals (minus the actual message bodies) are the same.

There seem to be two great approaches to how this is done:

  • transactional messages. Each change in the system is expressed as a delta, and the endpoints communicate with these deltas. Example: changes in DVCS.
  • REST: transferring an object graph in whole or in part without worrying about individual atomic changes.

EDIT: Some of the answers correctly say that the application information is not enough to offer good enough suggestions. The exact nature of the application can be distracting, but a very simple RSS reader is a pretty good approximation. Therefore, we say that the specification of the application is as follows:

  • There are two classes: feeds and items.
  • I can add, rename and delete channels. Adding a feed subscribes to it and starts receiving items for this feed. I can also change the display order of channels in the user interface.
  • When I read items, they are marked as read. I cannot mark them unread or do anything else with them.
  • Based on the foregoing, the object model:
    • "feed" has the attributes "url", "displayName" and "displayOrder" (displayOrder is the feed index in the channel list, reordering channels locally change the displayOrder of all channels so that the indices remain unique and sequential).
    • "item" has the attributes "url" and "unread", and the relationship "one" is "one" (each element belongs to one channel). "url" also behaves like a GUID for an element.
    • The content of the actual content is downloaded locally on each device and is not part of the synchronization.

Based on this design, I can configure the application on one device: add a bunch of feeds, rename and reorder them, and also read some elements on them, which are then marked as unread. When I switch devices, another device can synchronize the configuration and show me the same list of channels with the same names, order and the same states of reading / unreading elements.

(editing)

What I would like in the answers:

  • Is there anything important that I forgot above? Limitations, goals?
  • What good is this? (I understand that many computer science courses talk about long lengths and details ... I hope for a short circuit looking at some cool course or nuggets.)
  • What are some good examples of protocols that I could model after or even use out of the box? (I mention Dropbox and IMAP above ... I should probably read IMAP RFC.)
+7
architecture protocols network-protocols distributed
source share
4 answers

A few thoughts:

one). What assumptions can you make about the reliability of delivery of change notifications? And the reliability of ordering these notifications? My instinct is that it’s better to bear the loss and the wrong order, returning to the request for a complete resale of metadata.

2). In fact, you have a metadata stream as well as a data stream. What assumptions can you make regarding their relative ordering. Can you get new data with the version before receiving metadata? I repeat, I suspect this could happen. I expect the data payload to contain metadata version information. Consequently, can customers update their metadata when they need to?

3). Is it possible for data corresponding to two different versions of metadata to arrive on the device. I suspect yes. How easy can a customer handle this?

4). Metadata may need to include presentation or verification information.

+1
source share

The metadata you described is a graph. However, switching to an OWL / RDF track can be quite a change. Basically, you just need to have properties of objects that can be interconnected (for example, files aligned in a hierarchy). From this point of view, JSON is a very natural choice for accessing properties, combined with the REST API. If this approach is chosen, I recommend that you first examine the Open Data Protocol .

By the way, why not just use a version control system, for example. Git and have properties like JSON objects inside text files in the system? If each object has its own metadata stored in a very small fragment of JSON in a separate file, the system will automatically be able to perform most updates and automatically resolve conflicts. Most version control systems provide good APIS for this kind of purpose.

+1
source share

If I wanted to do this quickly without too much development time, I would just use WebDAV for the metadata files and execute them. IMO, which should cover most of your requirements. In addition, the use of an existing protocol has advantages over user protocols in existing libraries, without wasting time developing code for implementing the code and debugging code for the protocol.

EDIT: If you make the configuration file easy to merge as a file, you just need to save 2 versions of the configuration file. One basic version of how the configuration looked the last time we synchronized. One current version of metadata, and then you get your peer-to-peer version of metadata. With these 3 files, you do a simple 3-way merge, you automatically detect conflicts for the newer version, and that’s all. Maintaining the base version is important. Now, if you have merged with several clients, you can merge at different points and, therefore, a different version of your configuration file will be required as a database. Just save each synchronization result until you overwrite it with a new synchronization from this peer client. Theoretically, you can have XML configuration files, but trilateral merging of XML files is just painful, and the tools are not quite there yet, imho. The specific format or type of application does not really matter.

+1
source share

Take a look at this protocol proposal specification.

SLEEP - SYNCHRONIZED LIGHT EVENT, DETECTING CONSEQUENCES

+1
source share

All Articles