How to enhance a SOLR document when indexing with / solr / update

Question

How to enhance a SOLR document when indexing with / solr / update

To index my site, I have a Ruby script, which in turn generates a shell script that uploads each file to my document root in Solr. The shell script has many lines that look like this:

  curl -s \
 "http://localhost:8983/solr/update/extract?literal.id=/about/core-team/&commit=false" \
 -F "myfile=@/extra/www/docroot/about/core-team/index.html"

... and ends with:

curl -s http://localhost:8983/solr/update --data-binary \
'<commit/>' -H 'Content-type:text/xml; charset=utf-8'

Loads all documents to my document root in Solr. I use tika and ExtractingRequestHandler to load documents in various formats (mainly PDF and HTML) in Solr.

In the script that this script shell generates, I would like to enlarge certain documents based on whether their id field (a / k / a url) matches certain regular expressions.

Let's say that these are promotion rules (pseudo-code):

boost = 2 if url =~ /cool/
boost = 3 if url =~ /verycool/
# otherwise we do not specify a boost

HTTP-?

:

curl -s \
 "http://localhost:8983/solr/update/extract?literal.id=/verycool/core-team/&commit=false" \
 -F "myfile=@/extra/www/docroot/verycool/core-team/index.html" \
 -F boost=3

curl -s \
 "http://localhost:8983/solr/update/extract?literal.id=/verycool/core-team/&commit=false" \
 -F "myfile=@/extra/www/docroot/verycool/core-team/index.html" \
 -F boost.id=3

. , , , ( , , ).

, POST XML, boost . , , . , tika :

curl "http://localhost:8983/solr/update/extract?literal.id=doc5&defaultField=text" \
--data-binary @tutorial.html -H 'Content-type:text/html'

- , / . :

curl \ 
"http://localhost:8983/solr/update/extract?literal.id=mydocid&defaultField=text&boost=3"\
--data-binary @mydoc.html -H 'Content-type:text/html'

curl \ 
"http://localhost:8983/solr/update/extract?literal.id=mydocid&defaultField=text&boost.id=3"\
--data-binary @mydoc.html -H 'Content-type:text/html'

.

boost ( ) ? , : 1) / , 2) .

+5

solr apache-tika solr-cell

Dan Tenenbaum 09 . '11 2:24

1

Pascal Dimassimo · Accepted Answer · 2011-02-09T02:33:06+0000

Solr, /update. POST. , xml Solr. xml, .

How to enhance a SOLR document when indexing with / solr / update

More articles: