To index my site, I have a Ruby script, which in turn generates a shell script that uploads each file to my document root in Solr. The shell script has many lines that look like this:
curl -s \
"http://localhost:8983/solr/update/extract?literal.id=/about/core-team/&commit=false" \
-F "myfile=@/extra/www/docroot/about/core-team/index.html"
... and ends with:
curl -s http://localhost:8983/solr/update --data-binary \
'<commit/>' -H 'Content-type:text/xml; charset=utf-8'
Loads all documents to my document root in Solr. I use tika and ExtractingRequestHandler to load documents in various formats (mainly PDF and HTML) in Solr.
In the script that this script shell generates, I would like to enlarge certain documents based on whether their id field (a / k / a url) matches certain regular expressions.
Let's say that these are promotion rules (pseudo-code):
boost = 2 if url =~ /cool/
boost = 3 if url =~ /verycool/
HTTP-?
:
curl -s \
"http://localhost:8983/solr/update/extract?literal.id=/verycool/core-team/&commit=false" \
-F "myfile=@/extra/www/docroot/verycool/core-team/index.html" \
-F boost=3
curl -s \
"http://localhost:8983/solr/update/extract?literal.id=/verycool/core-team/&commit=false" \
-F "myfile=@/extra/www/docroot/verycool/core-team/index.html" \
-F boost.id=3
. , , , ( , , ).
, POST XML, boost . , , . , tika :
curl "http://localhost:8983/solr/update/extract?literal.id=doc5&defaultField=text" \
--data-binary @tutorial.html -H 'Content-type:text/html'
- , / . :
curl \
"http://localhost:8983/solr/update/extract?literal.id=mydocid&defaultField=text&boost=3"\
--data-binary @mydoc.html -H 'Content-type:text/html'
curl \
"http://localhost:8983/solr/update/extract?literal.id=mydocid&defaultField=text&boost.id=3"\
--data-binary @mydoc.html -H 'Content-type:text/html'
.
boost ( ) ? , :
1) / ,
2) .