The files I want to make for indexing are stored on the server (I do not need to scan) ./ Path / to / files / HTML image file
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta name="product_id" content="11"/> <meta name="assetid" content="10001"/> <meta name="title" content="title of the article"/> <meta name="type" content="0xyzb"/> <meta name="category" content="article category"/> <meta name="first" content="details of the article"/> <h4>title of the article</h4> <p class="link"><a href="#link">How cite the Article</a></p> <p class="list"> <span class="listterm">Length: </span>13 to 15 feet<br> <span class="listterm">Height to Top of Head: </span>up to 18 feet<br> <span class="listterm">Weight: </span>1,200 to 4,300 pounds<br> <span class="listterm">Diet: </span>leaves and branches of trees<br> <span class="listterm">Number of Young: </span>1<br> <span class="listterm">Home: </span>Sahara<br> </p> </p>
I added a request handler to the solrconfing.xml file.
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">/path/to/data-config.xml</str> </lst>
My data-config.xml looks like this:
<dataConfig> <dataSource type="FileDataSource" /> <document> <entity name="f" processor="FileListEntityProcessor" baseDir="/path/to html/files/" fileName=".*html" recursive="true" rootEntity="false" dataSource="null"> <field column="plainText" name="text"/> </entity> </document> </dataConfig>
I saved the default schema.xml file and added the following code snippet to the schema.xml file.
<field name="product_id" type="string" indexed="true" stored="true"/> <field name="assetid" type="string" indexed="true" stored="true" required="true" /> <field name="title" type="string" indexed="true" stored="true"/> <field name="type" type="string" indexed="true" stored="true"/> <field name="category" type="string" indexed="true" stored="true"/> <field name="first" type="text_general" indexed="true" stored="true"/> <uniqueKey>assetid</uniqueKey>
when I tried to perform a full import after installing it, it shows that all html files are extracted. But when I searched in SOLR, it did not show me any result. Does anyone have an idea what could be causing?
I understand that all files are uploaded correctly, but not indexed in SOLR. Does anyone know how I can index these meta tags and HTML file contents in SOLR?
Your answer will be appreciated.
source share