I tried to work with giza ++ in a window (using the Cygwin compiler). I used this code:
// Suppose the source language is French and the target language is English
plain2snt.out FrenchCorpus.f EnglishCorpus.e mkcls -c30 -n20 -pFrenchCorpus.f -VFrenchCorpus.f.vcb.classes opt mkcls -c30 -n20 -pEnglishCorpus.e -VEnglishCorpus.e.vcb.classes opt snt2cooc.out FrenchCorpus.f.vcb EnglishCorpus.e.vcb FrenchCorpus.f_EnglishCorpus.e.snt >courpuscooc.cooc GIZA++ -S FrenchCorpus.f.vcb -T EnglishCorpus.e.vcb -C FrenchCorpus.f_EnglishCorpus.e.snt -m1 100 -m2 30 -mh 30 -m3 30 -m4 30 -m5 30 -p1 o.95 -CoocurrenceFile courpuscooc.cooc -o dictionary
But after getting the output files from giza ++ and evaluating the output, I noticed that the results were too bad.
My evaluation result:
RECALL = 0.0889
PRECISION = 0.0990
F_MEASURE = 0.0937
AER = 0.9035
Dosing any body knows the reason? Maybe the reason is that I forgot some parameters, or should I change some of them?
in other words:
At first I wanted to collect giza ++ with a huge amount of data, and then check it on a small case and compare its result using the desired alignment (GOLD STANDARD), but I did not find a single document or useful page on the Internet.
Can you enter a useful document?
So I ran it with a small fist (sentence 447) and compared the result with the desired alignment. Do you think this is the right way?
I also changed my code as follows and got the best result, but still not very good:
GIZA ++ -S testlowsf.f.vcb -T testlowde.e.vcb -C testlowsf.f_testlowde.e.snt -m1 5 -m2 0 -mh 5 -m3 5 -m4 0 -CoocurrenceFile inputcooc.cooc -o dictionary - model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 0 -nsmooth 4 -onlyaldumps 1 -p0 0.999 -diagonal yes -final yes
evaluation result:
// assume that A is the result of GIZA ++, and G is the gold standard. As and Gs are the S link in files A and G. Ap and Gp are the p-link in files A and G.
RECALL = As intersect with Gs / Gs = 0.6295
PRECISION = Ap intersect with Gp / A = 0.1090
FMEASURE = (2 * ACCURACY * RECORD) / (BACK + ACCURACY) = 0.1859
AER = 1 - ((Since the intersection Gs + Ap intersects Gp) / (A + S)) = 0.7425
Do you know the reason?