Stanford NER Toolkit - Lower Case Recognition

I am new to NLP and trying to figure out how a named entity resolver annotates named entities. I am experimenting with the Stanford NER toolkit. When I use NER on standard more formal datasets, where all naming conventions are followed to represent named objects, such as in news feeds or news blogs, NER annotates objects correctly. However, when I run NER with informal datasets such as twitter, where named objects may not be capitalized, as they should have, NER does not annotate entities. The classifier that I use is a 3-CRF serialized class. Can someone tell me how I can get NER to recognize lowercase entities? Any helpful suggestions on how to hack NER and where this improvement should be made,very grateful. Thanks in advance for your help.

+5
source share
5 answers

I'm afraid there is no easy way to get trained models that we distribute to ignore case information at runtime. So, yes, they usually will only stick caps. One could train a cassette-free model that works reasonably (but not so well on the overlaid text, as the case is a great clue in English (but not in German, Chinese, Arabic, etc.).

+4
source

, , , -. , , - english.muc.7class.distsim.crf.ser.gz english.muc.7class.caseless.distsim.crf.ser.gz, , nlp .

, python , , (, )

st = NERTagger('/Users/username/stanford-corenlp-python/stanford-ner-2014-10-26/classifiers/english.muc.7class.caseless.distsim.crf.ser.gz', '/Users/username/stanford-corenlp-python/stanford-ner-2014-10-26/stanford-ner.jar')
+4

. , , 100-200 3-4 gazzeteer . , , , "eli".

+2

, Twitter . - , , , . , Twitter , .

, PArt of Speech tagging ?

+1

, - .

One way to potentially train a lowercase classifier is to run the uppercase classifier you already have with a large dataset of the corresponding English language, and then process this tagged text to remove the case. Then you have a tagged body that you can use to train the new classifier. This new classifier will not be perfect against Twitter because of the tweets, but it is a quick way to upload it.

+1
source

All Articles