Take a look at this tutorial first .
With solr, you need an XML document (or csv ) that will be redirected to solr (the process is called updating, indexing is the process of creating fields to search). The xml format is sth. as:
<add> <doc> <field name="id">9885A004</field> <field name="name">Canon PowerShot SD500</field> <field name="category">camera</field> <field name="features">3x optical zoom</field> <field name="features">aluminum case</field> <field name="weight">6.4</field> <field name="price">329.95</field> </doc> </add>
Check here for more details.
But in your case, you can use the finder (assigned if different or external sources) to recognize different document formats. Try to figure out if Nutch is here.
eg. See this presentation for an explanation of solr, lucene, and nutch.
source share