Mss-excel compatible csv file representing all documents in the MarkLogic directory

What is the best way to make an MS-Excel compatible csv file representing all the documents in the MarkLogic directory. Using XCC The Java Client and Tomcat and Marklogic both located remotely. The number of documents in the catalog is about 15,000.

+4
source share
1 answer

The first part, which receives all the documents in the directory, is ready for us from avoiding XDMP-EXPNTREECACHEFULL and loading the document

 cts:search( collection(), cts:directory-query('path/to/documents/', 'infinity')) 

As noted in my answer, if you need additional restrictions, you could cts:and-query that cts:directory-query with other terms cts:query .

Then you need to turn each XML document into a CSV. It's pretty simple, but you need to know how your XML is structured or how to output it somehow. In this example, I will say that I always have a simple child element a , b , c , d under some root element. Therefore, for the request, you must create a CSV header for these elements, followed by CSV lines.

We probably also want to pass the directory URI from the caller. If you used REST, this would use xdmp:get-request-field , but for XCC this is an external value.

 declare variable $DIRECTORY-URI as xs:string external ; declare function local:csv($root as element()) as xs:string { string-join(($root/a, $root/b, $root/c, $root/d), ',') }; 'A,B,C,D', cts:search( collection(), cts:directory-query($DIRECTORY-URI, 'infinity'))/local:csv(*) 

Again, to create local:csv for your application, you need knowledge of XML or in some way draw a conclusion about its structure. You may need to add some values ​​in double quotes. But this basic structure is one of the most effective ways to attack a problem. I avoided any XQuery FLWOR expressions so that the results could flow.

Another approach would be to use range indices and http://docs.marklogic.com/cts:value-tuples with cts:query to restrict the results and then convert the JSON to CSV. This would be even more effective since no fragments would be extracted. But this will not work with some XML structures, and you may not have the luxury of creating a range index for each CSV field.

 declare variable $DIRECTORY-URI as xs:string external ; declare function local:csv($ja as json:array) as xs:string { string-join(json:array-values($ja), ',') }; 'A,B,C,D', local:csv( cts:value-tuples( (cts:element-reference(xs:QName('a')), cts:element-reference(xs:QName('b')), cts:element-reference(xs:QName('c')), cts:element-reference(xs:QName('d'))), (), cts:directory-query($DIRECTORY-URI, 'infinity'))) 
+3
source

All Articles