Solr DataImportHandler: Can I get a dynamic field name from an xml attribute using XPathEntityProcessor?

I have an XML for swallowing in Solr, which sounds like a use case that is intended for a DataImportHandler solution. What I want to do is infer a column name from one XML attribute and a value from another attribute. Here is an example of what I mean:

<document> <data ref="reference.foo"> <value>bar</value> </data> </document> 

From this xml snippet, I want to add a field named reference.foo and value bar . DataImportHandler includes an XPathEntityProcessor for processing XML documents. I tried using it, and it works fine if I give it a known column name (for example, <field column="ref" xpath="/document/data/@ref"> ), but could not find any documentation or examples to suggest how to do what I want, or what he cannot do. So:

  • Can I do this with XPathEntityProcessor? If so, how?
  • If not, can I do it the other way using the DataImportHandler?
  • Or did I leave with writing my own import handler?
+4
source share
2 answers

I could not find a way to do this without involving a transformer, but using a simple ScriptTransformer , I processed it. This happens something like this:

 ... <script> function makePair(row) { var theKey = row.get("theKey"); var theValue = row.get("theValue"); row.put(theKey, theValue); row.remove("theKey"); row.remove("theValue"); return row; } </script> ... <entity name="..." processor="XPathEntityProcessor" transformer="script:makePair" forEach="/document" ...> <field column="theKey" xpath="/document/data/@ref" /> <field column="theValue" xpath="/document/data/value" /> </entity> ... 

Hope this helps someone!

Note that if your dynamicField is multi-valued, you need to iterate over the key, since row.get ("theKey") will be a list.

+5
source

What you want to do is select the node key for the attribute value.

In your example, you will do the following:

 <field column="ref" xpath="/document/data[@ref='reference.foo']"/> 
+1
source

All Articles