As far as I can tell from the Kentico 10 source code, the crawler used by Kentico SmartSearch is fully patented. It does not use a third-party library.
Loads the contents of a page using System.Web.HttpWebRequest . The full content is returned back to the SmartSearch index as a string. After that, it goes through text extraction and feeds it to Lucene for indexing.
It will not be easy to use Kentico SmartSearch for an external crawler. We usually stay away from the crawler because it is quite expensive to execute compared to the standard index, which retrieves data directly from the database.
Kentico supports some scheduled tasks in the Windows service , but not search tasks.
Please note that Kentico SmartSearch does not actually crawl the site by opening links. It uses the content tree to find out what content it should index. If you want to index other content, for example, from the system with which you are integrating, you need to implement a custom search service, as described here .
One thing that will work is for the external process to crawl any content you want to index and put the original HTML content in the repository. Then write a custom SmartSearch index that retrieves data from the Kentico indexing store. If you index Kentico-driven content, you can take it to the next level by connecting to document events. This should allow you to crawl pages only when they refresh.
Marnix van valen
source share