How to calculate tag accuracy and feedback for POS tags?

I use some rule-based and statistical POS tags to tag the corpus (about 5,000 sentences ) with parts of speech (POS). Below is a snippet of my test case, in which each word is separated by the corresponding POS tag by '/'.

No/RB ,/, it/PRP was/VBD n't/RB Black/NNP Monday/NNP ./. But/CC while/IN the/DT New/NNP York/NNP Stock/NNP Exchange/NNP did/VBD n't/RB fall/VB apart/RB Friday/NNP as/IN the/DT Dow/NNP Jones/NNP Industrial/NNP Average/NNP plunged/VBD 190.58/CD points/NNS --/: most/JJS of/IN it/PRP in/IN the/DT final/JJ hour/NN --/: it/PRP barely/RB managed/VBD *-2/-NONE- to/TO stay/VB this/DT side/NN of/IN chaos/NN ./. Some/DT ``/`` circuit/NN breakers/NNS ''/'' installed/VBN */-NONE- after/IN the/DT October/NNP 1987/CD crash/NN failed/VBD their/PRP$ first/JJ test/NN ,/, traders/NNS say/VBP 0/-NONE- *T*-1/-NONE- ,/, *-2/-NONE- unable/JJ *-3/-NONE- to/TO cool/VB the/DT selling/NN panic/NN in/IN both/DT stocks/NNS and/CC futures/NNS ./. 

After marking the case, it looks like this:

 No/DT ,/, it/PRP was/VBD n't/RB Black/NNP Monday/NNP ./. But/CC while/IN the/DT New/NNP York/NNP Stock/NNP Exchange/NNP did/VBD n't/RB fall/VB apart/RB Friday/VB as/IN the/DT Dow/NNP Jones/NNP Industrial/NNP Average/JJ plunged/VBN 190.58/CD points/NNS --/: most/RBS of/IN it/PRP in/IN the/DT final/JJ hour/NN --/: it/PRP barely/RB managed/VBD *-2/-NONE- to/TO stay/VB this/DT side/NN of/IN chaos/NNS ./. Some/DT ``/`` circuit/NN breakers/NNS ''/'' installed/VBN */-NONE- after/IN the/DT October/NNP 1987/CD crash/NN failed/VBD their/PRP$ first/JJ test/NN ,/, traders/NNS say/VB 0/-NONE- *T*-1/-NONE- ,/, *-2/-NONE- unable/JJ *-3/-NONE- to/TO cool/VB the/DT selling/VBG panic/NN in/IN both/DT stocks/NNS and/CC futures/NNS ./. 

I need to calculate the accuracy of tags ( Tag wise-Recall and Precision ), so you need to find the error (if any) in the mark for each pair of tag words.

The approach I'm thinking of is to scroll through these 2 text files and save them in a list, and then compare the "two" element by element.

The approach seems very rude to me, so you guys would suggest some better solution to the above problem.

On the wikipedia page:

In the classification problem, accuracy for a class is the number of true positives (i.e., the number of elements marked as belonging to a positive class) divided by the total number of elements marked as belonging to a positive class (i.e., the sum of true positive and false positive elements, which incorrectly relate to elements marked as belonging to the class). Feedback in this context is defined as the number of true positive divisions by the total number of elements that actually belong to the positive class (i.e., the sum of true positive and false negatives that were not marked as belonging to the positive class, but should have been).

+6
python shell text-processing machine-learning nlp
source share
1 answer

Please note that since each word has exactly one tag, the general response and accuracy values ​​do not make sense for this task (they will both be equal to the measurement accuracy). But it makes sense to ask for precision measures for each tag, for example, you can find feedback and accuracy for the DT tag.

The most effective way to do this for all tags is at the same time the same as you suggested, although you can save one pass through the data by skipping the list compilation step. Read in the line of each file, compare two lines word by word and repeat until you reach the end of the files. For each word comparison, you probably want to check that the words are equal for sanity, rather than assuming the two files are in sync. For each type of tag you save three current totals: true positive, false positives and false negatives. If the two tags for the current word match, increase the true positive value for the tag. If they do not match, you need to increase the false negative total for the true tag and the false positive total for the tag that your computer mistakenly selected. In the end, you can calculate the number of points and accuracy for each tag by following the formula in the excerpt from Wikipedia.

I have not tested this code, but my Python is but rusty, but this should give you this idea. I assume the files are open and the totals data structure is a dictionary of dictionaries:

 finished = false while not finished: trueLine = testFile.readline() if not trueLine: # end of file finished = true else: trueLine = trueLine.split() # tokenise by whitespace taggedLine = taggedFile.readline() if not taggedLine: print 'Error: files are out of sync.' taggedLine = taggedLine.split() if len(trueLine) != len(taggedLine): print 'Error: files are out of sync.' for i in range(len(trueLine)): truePair = trueLine[i].split('/') taggedPair = taggedLine[i].split('/') if truePair[0] != taggedPair[0]: # the words should match print 'Error: files are out of sync.' trueTag = truePair[1] guessedTag = taggedPair[1] if trueTag == guessedTag: totals[trueTag]['truePositives'] += 1 else: totals[trueTag]['falseNegatives'] += 1 totals[guessedTag]['falsePositives'] += 1 
+5
source share

All Articles