Access annotations in UIMA

Is there a way in UIMA to access annotations from tokens, like in their CAS debugger GUI ?. Of course, you can access all annotations from the index repository, but I want to loop on the tokens, and get all the associated annotations for each token.

The reason for this is simple, I want to want to check out some annotations and discard others, and thus it is much easier. Any help is appreciated :)

+6
source share
3 answers

After searching and interviewing cTAKES developers (Apache Clinical Text Analysis and Knowledge Extraction System). you can use the following uimafit library, which can be found at http://code.google.com/p/uimafit/ . You can use the following code

List list = JCasUtil.selectCovered(jcas, <T extends Annotation>, startIndex, endIndex); 

This will return everything between the two indexes.

Hope that helps

+4
source

I am a uimaFIT developer.

If you want to find all annotations within the boundaries of another annotation, you may prefer a shorter and faster option.

 JCasUtil.selectCovered(referenceAnnotation, <T extends ANNOTATION>); 

Keep in mind that it is not recommended to create a “dummy” annotation with the desired offsets and then search within its boundaries, as this immediately allocates memory in the CAS, which does not collect garbage if the full CAS is not collected.

+8
source

If you do not want to use uimaFIT, you can create a filtered iterator to view the annotations of interest. UIMA reference documentation is here: UIMA reference documentation

I recently used this approach in some code to find a sentence annotation that covered a regular expression annotation (this approach was acceptable for our project because all regular expression matches were shorter than the sentences in the document and there was only one regular match per sentence Obviously, based on indexing rules, your mileage may vary. If you are afraid of running into another shorterAnnotationType , put the internal code in the while loop):

 static ArrayList<annotationsPair> process(Annotation shorterAnnotationType, Annotation longerAnnotationType, JCas aJCas){ ArrayList<annotationsPair> annotationsList = new ArrayList<annotationsPair>(); FSIterator it = aJCas.getAnnotationIndex().iterator(); FSTypeConstraint constraint = aJCas.getConstraintFactory().createTypeConstraint(); constraint.add(shorterAnnotationType.getType()); constraint.add(longerAnnotationType.getType()); it = aJCas.createFilteredIterator(it, constraint); Annotation a = null; int shorterBegin = -1; int shorterEnd = -1; it.moveTo((shorterAnnotationType)); while (it.isValid()) { a = (Annotation) it.get(); if (a.getClass() == shorterAnnotationType.getClass()){ shorterBegin = a.getBegin(); shorterEnd = a.getEnd(); System.out.println("Target annotation from " + shorterBegin + " to " + shorterEnd); //because assume that sentence type is longer than other type, //the sentence gets indexed prior it.moveToPrevious(); if(it.isValid()){ Annotation prevAnnotation = (Annotation) it.get(); if (prevAnnotation.getClass() == longerAnnotationType.getClass()){ int sentBegin = prevAnnotation.getBegin(); int sentEnd = prevAnnotation.getEnd(); System.out.println("found annotation [" + prevAnnotation.getCoveredText() + "] location: " + sentBegin + ", " + sentEnd); annotationsPair pair = new annotationsPair(a, prevAnnotation); annotationsList.add(pair); } //return to where you started it.moveToNext(); //will not invalidate iter because just came from next } } it.moveToNext(); } return annotationsList; } 

Hope this helps! Disclaimer: I am new to UIMA.

+3
source

All Articles