Marklogic: response time is very long

I have about 15,000 xml entries in uri, say: " documents/products/specs/*.xml ". Each xml has a size of about 25 kilobytes. I connected to this marklogic server on a remote Apache Tomcat server with an XCC client (Java) that is trying to execute AdHocQuery that resembles something like this:

 let $a := cts:uri-match('documents/products/specs/*.xml') for $xml in $a return fn:doc($a) 

(implemented in java for the loop).

It works great. But for records with a large number of samples, say, for recording 15,000, it takes 60 minutes, when all server and Internet speeds are very good. (The total size of the entire document in uri will be about 20 MB, which should not take more than 20 minutes).

Is there a workaround?

+2
source share
3 answers

Try the following:

 cts:search( fn:doc(), cts:document-query( cts:uri-match('documents/products/specs/*.xml') ), "unfiltered" ) 
+2
source

What you are doing is requesting the full amount of ALL documents. This is not a typical query; rather, it is a DB dump. A request is called that will display all this data, and then send it via tomcat, which again buffers all the data and then sends it to you. This is a large dataset for sending on one request.

What is the purpose of your request? If you want to receive all the documents, you must either reset them using a program, for example, mlcp, or get them in smaller batches, first collecting the URI and then extracting the documents. This can be accelerated by doing parallel document collection. You can see examples of the Java source in xmlsh, which shows how to correctly receive documents in XCC

 http://xmlsh.svn.sourceforge.net/viewvc/xmlsh/extensions/marklogic/src/org/xmlsh/marklogic/get.java?revision=792&view=markup 

My guess (correct me if I am wrong) is that you are simply experimenting and do not really need all the documents. In this case, you should try a more realistic query.

+1
source

The reason the request takes a very long time is because for most of these files, the Marklogic server reads the form on disk. Unless you have a really large tree cache. What you need to do is reduce the volume of your request. Perhaps add some indexes to the files.

All that is said, if all you want to do is ETL, then you may want to execute the data package.

0
source

All Articles