What label set is used in the OpenNLP german maxent model?

I am currently using OpenNLP tools for German PoS-tag offerings, with the maximum model listed on their download site :

  de POS Tagger Maxent model trained on tiger corpus.  de-pos-maxent.bin

This works very well, and I got results like:

  Diese, Community, bietet, Teilnehmern, der, Veranstaltungen, die, Möglichkeit ...
 PDAT, FM, VVFIN, NN, ART, NN, ART, NN ...

With tagged sentences, I want to do some extra processing where I need to know the meaning of the individual tags. Unfortunately, the OpenNLP-Wiki search for tag sets is not very useful as it says:

  TODO: Add more tag sets, also for non-english languages

Does anyone know where I can find the tag set used in the German maxent model?

+7
source share
3 answers

It looks like the STTS tag is being used . This tag set is considered the most common tag set for the German language, for example. in this question or in this Wikipedia entry .

+6
source

I created an enumeration containing German tags (reverse search is possible):

public enum POSGermanTag { ADJA("Attributives Adjektiv"), ADJD("Adverbiales oder prädikatives Adjektiv"), ADV("Adverb"), APPR("Präposition; Zirkumposition links"), APPRART("Präposition mit Artikel"), APPO("Postposition"), APZR("Zirkumposition rechts"), ART("Bestimmer oder unbestimmer Artikel"), CARD("Kardinalzahl"), FM("Fremdsprachichles Material"), ITJ("Interjektion"), KOUI("unterordnende Konjunktion mit zu und Infinitiv"), KOUS("unterordnende Konjunktion mit Satz"), KON("nebenordnende Konjunktion"), KOKOM("Vergleichskonjunktion"), NN("normales Nomen"), NE("Eigennamen"), PDS("substituierendes Demonstrativpronomen"), PDAT("attribuierendes Demonstrativpronomen"), PIS("substituierendes Indefinitpronomen"), PIAT("attribuierendes Indefinitpronomen ohne Determiner"), PIDAT("attribuierendes Indefinitpronomen mit Determiner"), PPER("irreflexives Personalpronomen"), PPOSS("substituierendes Possessivpronomen"), PPOSAT("attribuierendes Possessivpronomen"), PRELS("substituierendes Relativpronomen"), PRELAT("attribuierendes Relativpronomen"), PRF("reflexives Personalpronomen"), PWS("substituierendes Interrogativpronomen"), PWAT("attribuierendes Interrogativpronomen"), PWAV("adverbiales Interrogativ- oder Relativpronomen"), PAV("Pronominaladverb"), PTKZU("zu vor Infinitiv"), PTKNEG("Negationspartike"), PTKVZ("abgetrennter Verbzusatz"), PTKANT("Antwortpartikel"), PTKA("Partikel bei Adjektiv oder Adverb"), TRUNC("Kompositions-Erstglied"), VVFIN("finites Verb, voll"), VVIMP("Imperativ, voll"), VVINF("Infinitiv"), VVIZU("Infinitiv mit zu"), VVPP("Partizip Perfekt"), VAFIN("finites Verb, aux"), VAIMP("Imperativ, aux"), VAINF("Infinitiv, aux"), VAPP("Partizip Perfekt"), VMFIN("finites Verb, modal"), VMINF("Infinitiv, modal"), VMPP("Partizip Perfekt, modal"), XY("Nichtwort, Sonderzeichen"), UNDEFINED("Nicht definiert, zb. Satzzeichen"); private final String desc; private static final Map<String, POSGermanTag> nameToValueMap = new HashMap<String, POSGermanTag>(); static { for (POSGermanTag value : EnumSet.allOf(POSGermanTag.class)) { nameToValueMap.put(value.name(), value); } } public static POSGermanTag forName(String name) { return nameToValueMap.get(name); } private POSGermanTag(String desc) { this.desc = desc; } public String getDesc() { return this.desc; } } 
+8
source

I understand that the OpenNLP POS tagger for German has been trained on the Tiger package. This package really uses the STTS tag set with a few changes. I found the following message: A Brief Introduction to the Model Tiger Shell

+3
source

All Articles