Natural Language Processing - Word Alignment

Question

Natural Language Processing - Word Alignment

I am looking for tools and word alignment algorithms.
I am dealing with bilingual English - Hindi and am currently working on

DTW (Dynamic Warping Algorithm)
CLA (Competitive Binding Algorithm)
NATools
Giza ++

Could you suggest any other algorithm / tool that does not depend on the language and which could achieve Statistical word alignment for parallel English Hindi corporation and its evaluation . Some tools are best suited for certain languages; Could you tell me how true this is, and if so, could you give an example of what is best for Asian languages such as Hindi. Contracted examples of what I should not use for such languages are also welcome.

I heard a little about Uplug word aligner ... Can someone tell me if this tool is useful for my purpose.

Thanks..:)

+7

alignment nlp linguistics

boddhisattva Mar 11 '10 at 14:18

source share

4 answers

Uplug is a great tool, I use it to align English and Macedonian texts. It essentially builds on Giza ++, adding so-called alignments. This advanced setup actually combines tooltip alignments with Giza ++ and performs 3 such iterations. The more hints (pos-tags, lemmas ...), the better the result. But I have to mention that you should not expect to get fundamentally different results, just using Giza ++.

In any case, if you plan to seriously study the topic of SMT, I suggest you read the article (profile) about Uplug, it will be very useful for you.

+2

msaveski May 14, '10 at 0:08

source share

Moses is a translation machine for statistical machines that you might want to see. Its word alignment component is built on GIZA ++, but can be improved to work better with certain language pairs than pure GIZA ++. Their mailing list and resources, which you can find at http://www.statmt.org/ , can also be a better place for questions on this topic than SO. One thing that you did not say anything about, but which I consider even more problematic, is to get a parallel Hindi ↔ English package.

0

ferdystschenko Mar 12 '10 at 19:06

source share

You have a vague and broad question.

Try: http://scholar.google.com/scholar?q=algorithm+language+independent+statistical+word+alignment&hl=en&safe=off&client=firefox-a&hs=hJt&rls=com.ubuntu:en-US:official&um=1&ie= UTF-8 & oi = scholart

for a list of articles in this area.

-one

Charles Merriam Mar 12 '10 at 0:30

source share

dmcer · Accepted Answer · 2010-03-18T04:08:24+0000

Berkeley Aligner is very good. By collaboratively preparing IBM's word alignment models, it can get a much lower alignment error rate (AER) than older packages like GIZA ++.

It also supports some additional features, such as syntax distortion (i.e., using parsing tree information to improve alignment). To do this, you need parse trees for one of the pairs of languages. So, you should be fine doing Hindi ↔ English, as there are many freely available and good English parsers.

If you decide not to go with the Berkeley Aligner, you probably should just use GIZA ++. For many years, it has been essentially standard word alignment in the machine translation community.

Natural Language Processing - Word Alignment

More articles: