Are there NER and RegexNER tags in StanfordCoreNLPServer output?

I use StanfordCoreNLPServer to extract some information from text (e.g. surfaces, street names)

The street is given by a specially trained NER model, and the surface is given by a simple regular expression through RegexNER.

Each of them works fine individually, but when used together only the NER result is present in the output under the tag ner. Why is there no tag regexner? Is there any way to get the result of RegexNER?

For information:

  • StanfordCoreNLP v3.6.0

  • URL used:

    'http://127.0.0.1:9000/'
    '?properties={"annotators":"tokenize,ssplit,pos,ner,regexner", '
    '"pos.model":"edu/stanford/nlp/models/pos-tagger/french/french.tagger",'
    '"tokenize.language":"fr",'
    '"ner.model":"ner-model.ser.gz", ' # custom NER model with STREET labels
    '"regexner.mapping":"rules.tsv", ' # SURFACE label
    '"outputFormat": "json"}'
    

    as suggested here , the annotation is regexner after ner , but still ...

  • Current output (extract):

    {u'index': 4, u'word': u'dans', u'lemma': u'dans', u'pos': u'P', u'characterOffsetEnd': 12, u'characterOffsetBegin': 8, u'originalText': u'dans', u'ner': u'O'}
    {u'index': 5, u'word': u'la', u'lemma': u'la', u'pos': u'DET', u'characterOffsetEnd': 15, u'characterOffsetBegin': 13, u'originalText': u'la', u'ner': u'O'}
    {u'index': 6, u'word': u'rue', u'lemma': u'rue', u'pos': u'NC', u'characterOffsetEnd': 19, u'characterOffsetBegin': 16, u'originalText': u'rue', u'ner': u'STREET'}
    {u'index': 7, u'word': u'du', u'lemma': u'du', u'pos': u'P', u'characterOffsetEnd': 22, u'characterOffsetBegin': 20, u'originalText': u'du', u'ner': u'STREET'}
    [...]
    {u'index': 43, u'word': u'165', u'lemma': u'165', u'normalizedNER': u'165.0', u'pos': u'DET', u'characterOffsetEnd': 196, u'characterOffsetBegin': 193, u'originalText': u'165', u'ner': u'NUMBER'}
    {u'index': 44, u'word': u'm', u'lemma': u'm', u'pos': u'NC', u'characterOffsetEnd': 198, u'characterOffsetBegin': 197, u'originalText': u'm', u'ner': u'O'}
    {u'index': 45, u'word': u'2', u'lemma': u'2', u'normalizedNER': u'2.0', u'pos': u'ADJ', u'characterOffsetEnd': 199, u'characterOffsetBegin': 198, u'originalText': u'2', u'ner': u'NUMBER'}
    
  • : , 3 SURFACE, regexner.

, .

+5
3

, , , regexner:

"annotators":"regexner,tokenize,ssplit,pos,ner",

, - ?

+4

RegexNER:

RegexNER , , , , . O , , .

( | | | | )

     

Lalor LOCATION PERSON

     

, , , NER NUMBER, RegexNER . , NUMBER- SURFACE , .

+4

Update for coreNLP 3.9.2 server via python:

When using coreNLP 3.9.2 server through python, regexner can now also be initiated as part of a document according to documents . For instance:

from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')

properties={"annotators":"tokenize,ssplit,pos,lemma,ner,coref,openie",
            "outputFormat": "json",
            "ner.fine.regexner.mapping":"rules.txt",}

output = nlp.annotate(text,properties=properties)

I could not get the regexner annotator to work by calling it directly. I think this is due to dependency reloading and / or the method used to translate the output to JSON, for example this problem

0
source

All Articles