Linguistic tagger incorrectly marked as "OtherWord"

I used the NSLinguisticTagger with sentences and ran into a strange problem with sentences like "I'm hungry" or "I'm drunk." Although one would expect the “I” to be marked as a pronoun, the “I” as a verb and the “hungry” as an adjective, this is not so. Most likely they are all marked as OtherWord .

Is there something I'm doing wrong?

 NSString *input = @"I am hungry"; NSLinguisticTaggerOptions options = NSLinguisticTaggerOmitWhitespace; NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:[NSLinguisticTagger availableTagSchemesForLanguage:@"en"] options:options]; tagger.string = input; [tagger enumerateTagsInRange:NSMakeRange(0, input.length) scheme:NSLinguisticTagSchemeNameTypeOrLexicalClass options:options usingBlock:^(NSString *tag, NSRange tokenRange, NSRange sentenceRange, BOOL *stop) { NSString *token = [input substringWithRange:tokenRange]; NSString *lemma = [tagger tagAtIndex:tokenRange.location scheme:NSLinguisticTagSchemeLemma tokenRange: NULL sentenceRange:NULL]; NSLog(@"%@ (%@) : %@\n", token, lemma, tag); }]; 

And the result:

 I ((null)) : OtherWord am ((null)) : OtherWord hungry ((null)) : OtherWord 
+7
ios objective-c cocoa nlp linguistics
source share
1 answer

After quite a while in the chat, we found a problem:

The proposal does not contain sufficient information to determine its language.

To fix this, you can:

Add a demo offer to your language of choice after your actual offer. This should ensure that your preferred language is discovered.

OR

Tell the tagger which language to use: add a line

 [tagger setOrthography:[NSOrthography orthographyWithDominantScript:@"Latn" languageMap:@{@"Latn" : @[@"en"]}] range:NSMakeRange(0, input.length)]; 

before calling enumerate . Thus, you will explicitly tell the tagger which language you want the text to be in, in this case englisch ( en ) as part of the Latin dominant language ( Latn ).

If you do not know the language for sure, it can be useful to use any of theses methods only as a reserve if words are marked as OtherWord , which means that the language cannot be detected.

+9
source share

All Articles