Caching Strategy for Deep, Relocatable Egocentric Graphics

First, let me explain what I create:

  • I have a D3.js Force layout graph that is embedded in the center and has a bunch of nodes located around it. The center of a node is an entity of some kind, the nodes around it are other objects that are somehow related to the root. Ribs are actual relationships (i.e. how they are connected).

  • External nodes can be clicked to center the target entity and load its relationships.

  • This graph is "egocentric" in the sense that every time a node is clicked, it becomes the center, and only relationships directly related to itself are displayed.

My setup, if that matters:

  • I serve the API through Node.js, which translates requests into requests to a CouchDB server with huge datasets.

  • D3.js is used for linking, and besides jQuery and Bootstrap I do not use any other client libraries. If someone helps with this caching task, I am open to suggestions :)

My ideas:

  • I could capture several levels of the chart each time (several times rewriting the process of listing and expanding the children), but since clicking on any given node loads completely unrelated data, it is not guaranteed to get a high percentage of similar data that was downloaded for the root. This seems like a complete loss, and actually a step in the opposite direction - I end up doing more processing this way!

  • I can easily save the Entities hash table that have already been received and check the list before requesting data for this object from the server. I will probably do this regardless of the cache strategy that I am implementing, as this is a really easy way to reduce requests.

Now, how do you propose caching this data?

Is there any super-efficient strategy you can come up with for such caching? Both server and client side options are welcome. A ton of data is involved in this process, and any reduction in requests / processing puts me a mile ahead of the game.

Thanks!

+4
source share
2 answers

On the client side, I would have nodes, and their children would either be an array of children, or a function that serves as a promise to these children. When you click on a given node, if you have data, display it immediately. Else will send an AJAX request that will populate it.

Whenever you show a node (not centered), create an asynchronous list of AJAX requests for the child nodes of the displayed nodes and start querying them. Thus, when a user clicks, there is a chance that you have already cached him. And if not, well, you tried and didn’t cost anything.

Once you earn it, decide how many levels in depth make sense. I assume that the magic number is likely to be 1. In addition to the fact that the response speed drops rapidly, and the server load increases rapidly. But when returning clicks, ASAP is a pretty big gain in the user interface.

+2
source

I think you need to do two things:

  • Reduce the number of completed queries.
  • Reduce the cost of requests

As btilly points out, you're probably best at querying related nodes for each visible node, so if they are clicked, the visualization responds right away - you don't want the request plus transition time as a response lag.

However, if you have such a great need to reduce the number of requests, this suggests that your requests themselves are too expensive, since the overall load is requestCost * numRequests . Consider pre-calculating the set of nodes associated with each node, so the query is a read request, not a complete database request. If you think this sounds complicated, this is what Google does every time you look for a new thing; they cannot search the Internet every time you start typing, so they do it ahead of time and cache.

This may mean some degree of denormalization; if you have a cache and a request, there is no guarantee that they are synchronized, but the question arises whether your data set is changing; write once, read a lot?

To minimize the space required by all these nodes and their relationships, we consider in more detail the problem of particle interaction; dividing the space in which you can group nodes so that you only need to request a group of nodes for its aggregate neighbors, and save this. Thus, you can perform much less filtering for each query, rather than a full database query. If it is O (n log n) and you make it a hundred times smaller, it is more than 100 times faster.

+1
source

All Articles