Tika has not yet sent language for Farsi. Starting with version 1.0 , 27 languages are supported :
languages=be,ca,da,de,eo,et,el,en,es,fi,fr,gl,hu,is,it,lt,nl,no,pl,pt,ro,ru,sk,sl,sv,th,uk
li () 0,41, 0,022. LanguageIdentifier . .
(, ISO- 639-1 2- fa) .
, Tika , .
:
. Hamshahri. . XML.
ngram . TikaCLI:
java -jar tika-app-1.0.jar --create-profile=fa -eUTF-8 fa-corpus.txt
fa.ngp, n-.
Tika , . , LanguageIdentifier.initProfiles(), tika.language.override.properties . , ngram .
Tika, .
:
, .