How can I use Apache to load balance Marklogic Cluster

Hi, I am new to Marklogic and Apache. I was given the task of using apache as a loadbalancer for our Marklogic cluster of 3 machines. Currently, the Marklogic cluster runs on Linux servers.

How can we achieve this? Any information on this would be helpful.

+5
source share
2 answers

You can use mod_proxy_balancer . How you configure it depends on the MarkLogic client you would like to use. If you want to use the Java API , please follow the second example here to allow apache to generate cookie stickiness. If you want to use XCC, configure it to use the cookie "SessionID" created by the ML server .

The difference here is that XCC uses sessions, while the Java client API is built on a REST API that has no state, so there are no sessions. However, even in the Java client API, when using transactions with multiple requests that impose state for the duration of this transaction, so the load balancer needs a way to route requests during this transaction to the correct node in the MarkLogic cluster. Cookies stickiness files will be re-submitted by the Java Client API with every request that uses a transaction, so the load balancer can maintain this stickiness for requests associated with this transaction.

As always, do some testing of your configuration to make sure you understand it correctly. Properly configuring apache plugins is an advanced skill. Since you are new to apache, your best hope for you to understand that this is correct is to check with an HTTP monitoring tool such as WireShark to view HTTP traffic from your application to MarkLogic Server to make sure everything goes to correct node in as expected.

+6
source

Please note that even with client APIs (Java, Node.js), this is not always obvious or explicit at the language API level, which can cause a session to be created. There will definitely be an explicit transaction creation with multiple statements, but other operations can also do this. If you use the same connection for the user interface (browser) and API (REST or XCC), then the browser application will probably do what creates the session state.

The safest, but least flexible configuration is TCP Session Affinity. If supported, they will fix most of the problems associated with load balancing. Cookie Session Affinity relies on guarenteeing that the load balancer uses the correct cookie. Not all code is equal. I have had cases where load balancing did not always use a cookie. Changing the configuration to “Load Balancer provided that Cookie Affinity” is fixed.

None of this is required if all of your messages have no status at the TCP level, HTTP level, and application level. A later version cannot be output by the server. Another reason is that your application or middle tier, together with other applications or with the same application, connects to the same load balancer and port. It can be difficult to make sure that there are no “crossed wires”. When ML receives the request, it associates its identity with the client's IP address and port. Even without a load balancer, most modern HTTP and TCP client libraries implement socket caching. A great win in perfumery, but a hidden source of subtle random serious errors if the library or application shares cookies (not uncomnon). The TCP cache and Cookie Jar used by different application contexts can ultimately send state information from one unrelated application to the same process in another. Basically, these are mid-level application servers that can simply transfer requests from the first level without domain knowledge, believing that, based on the low-level TCP libraries, "do the right thing" ... They do the right thing - to use the preface, which they had in mind library programmers - do not assume that your case is one that the authors of the library suggested. Symptoms are usually very rare, but catastrophic problems with transaction failures and, possibly, data corruption and security problems (at the application level), because the server cannot distinguish 2 connections from the same average level.

Sometimes the best strategy is to balance the balance between the first tier and the middle tier and direct connection from the middle tier to MarkLogic. Especially if caching is done on a load balancer. Most often, caching can be useful between the middle tier and the client, and then the middle tier and the server. It is also more like the classic three-tier architecture used with RDBMS .. where load balancing between client and business logic layers is not between business logic and the database.

+1
source

All Articles