Reading PDF per line

How can I read a PDF line by lineusing iText5 for .NET? I have an Internet search, but I just found reading a PDF file on the page content.

See below code.

public string ReadPdfFile(object Filename)
{

    string strText = string.Empty;
    try
    {
        PdfReader reader = new PdfReader((string)Filename);

        for (int page = 1; page <= reader.NumberOfPages; page++)
        {
            ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();

            String s = PdfTextExtractor.GetTextFromPage(reader, page, its);

            s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
            strText = strText + s;

        }
        reader.Close();
    }
    catch (Exception ex)
    {
        MessageBox.Show(ex.Message);
    }
    return strText;
}
+5
source share
4 answers

Try this, use LocationTextExtractionStrategyinstead SimpleTextExtractionStrategyto add line text to the returned text. Then you can use strText.Split ('\ n') to split the text by string[]and use it for each line.

+5
source

You can find here the PDF2Text Pilot , licensed under the BSD Open-Sourse software.

, ++, .

#, , - ?

+3

PDF , , , ... ... PDF . , , , , , , .. . PDF - .

+3

If you are creating an e-book reader for PDF, or just showing how the PDF, and look the same as other ready-made pdf. Or read the text and reformat yourself.

I prefer the second method, just format the text as you like, because if I use an e-book reader, I just like the content and never care about how it should look.

+2
source

All Articles