Java API: loading and calculating tf-idf for a given web page

I am new to IR methods.

I am looking for a Java interface or tool that does the following.

  • Download the specified set of URLs
  • Remove the markers
  • Delete stop words
  • Stitch
  • Create inverted pointer
  • Calculate TF-IDF

Please let me know how Lutsen can be useful to me.

Yuvi Relations

+5
source share
2 answers

You can try the Word Vector Tool - it has been a while since the last version, but everything is fine here. He should be able to follow all the steps you describe. However, I have never used a part of the finder.

+4

, TF-IDF - , , . TF-IDF , , Lucene. , (, URL-, , , ). , Solr.

+3

All Articles