(I mainly work with the iText Java library, not the iTextSharp .Net library, so please ignore some Java-isms here, everything should be easy to translate.)
To extract the contents of a page using iText (Sharp), you use the classes in the parser package to feed it after some preliminary processing on the RenderListener of your choice.
In a context in which you are only interested in text, you most often use TextExtractionStrategy , which is obtained from RenderListener , and adds one getResultantText method to extract aggregated text from the page.
As the original intention of parsing text in iText was to implement this use case, most existing RenderListener patterns are TextExtractionStrategy implementations and only make text available.
Therefore, you will need to implement your own RenderListener , which, as you think, has a Christian TextWithPositionExtractionStategy .
Just like SimpleTextExtractionStrategy (which is implemented with some assumptions about the structure of the page content operators) and LocationTextExtractionStrategy (which does not have the same assumptions, but is somewhat more complicated), you might want to start with an implementation that makes some assumptions.
Thus, as in the case of SimpleTextExtractionStrategy , in your first, simple implementation, you expect that the text rendering events passed to your listener will arrive line by line and from line to line from left to right. Thus, as soon as you find a horizontal gap or punctuation, you know that your current word is finished, and you can process it.
Unlike text retrieval strategies, you do not need a StringBuffer member to collect your result, but instead a list of a word with position structure. In addition, you need a member variable to store TextRenderInfo events that you have already collected for this page, but could not be finalized (you can get the word in several separate events).
Once you (i.e. your renderText method) are called for a new TextRenderInfo object, you should work as follows (pseudocode):
if (unprocessedTextRenderInfos not empty) { if (isNewLine
In process(unprocessedTextRenderInfos) you extract the necessary information from unprocessedTextRenderInfos; You combine the contents of a single text into a word and take the necessary coordinates; if you just want to start the coordinates, you take them from the first of these raw TextRenderInfos. If you need more data, you are also using data from another TextRenderInfos. With this data, you fill out the word with position structure and add it to the list of results.
When page processing is complete, you need to call the call process (unprocessedTextRenderInfos) and unprocessedTextRenderInfos.clear (); alternatively you can do this in the endTextBlock method.
Having done this, you can feel ready to implement a slightly more complex version, which does not have the same assumptions regarding the structure of the page content .;)