Removing HTML in SOLR for storage, not indexing

Hi guys, I was able to remove HTML from content when indexing data in SOLR.

But is it possible to remove HTML from data with simple data storage?

This is my field:

<field name="Content" type="textNoHTML" indexed="true" stored="true"/>

And, the field type "textNoHTML" implements solr.HTMLStripCharFilterFactory:

<charFilter class="solr.HTMLStripCharFilterFactory" />

As I said, this works great for indexing, but is it possible to apply a similar filter for storage?

Hooray!

+5
source share
1 answer

If you are using DataImportHandler, you can use HTMLStripTransformer .

. .NET, HtmlAgilityPack.

+3

All Articles