Convert PDF file to text in C #

I need to convert a .pdf file to a .txt file (or .doc, but I prefer .txt).

How to do it in C #?

+7
c # pdf text-files
source share
6 answers

Ghostscript can do what you need. The following is a command to extract text from a pdf file to a txt file (you can run it from the command line to see if it works for you):

gswin32c.exe -q -dNODISPLAY -dSAFER -dDELAYBIND -dWRITESYSTEMDICT -dSIMPLE -c save -f ps2ascii.ps "test.pdf" -c quit >"test.txt" 

Check here: codeproject: convert PDF to image using the Ghostscript API for more information on how to use ghostscript with C #

+3
source share

I had the very need, and I used this article to get started: http://www.codeproject.com/KB/string/pdf2text.aspx

+4
source share

As an alternative to Don’s solution, I found the following there:

Extract text from PDF in C # (100% .NET)

+1
source share

The concept of converting PDF to text is not really straightforward, and you will not see that someone is sending here a code that converts PDF to text directly. Therefore, it is best to use a library that will do the job for you ... PDFBox is good, you can use it. You will probably find it in java, but fortunately you can use IKVM to convert it to .Net ....

0
source share

The Docotic.Pdf library can extract text from PDF files (formatted or not).

Here is sample code that shows how to extract formatted text from a PDF file and save it in another file.

 public static void ExtractFormattedText(string pdfFile, string textFile) { using (PdfDocument doc = new PdfDocument(pdfFile)) { string text = doc.GetTextWithFormatting(); File.WriteAllText(textFile, text); } } 

In addition, there is a sample on our website that shows other options for extracting text from PDF files .

Disclaimer: I work for Bit Miracle, a library provider.

0
source share
  public void PDF_TEXT() { richTextBox1.Text = string.Empty; ReadPdfFile(@"C:\Myfile.pdf"); //read pdf file from location } public void ReadPdfFile(string fileName) { string strText = string.Empty; StringBuilder text = new StringBuilder(); try { PdfReader reader = new PdfReader((string)fileName); if (File.Exists(fileName)) { PdfReader pdfReader = new PdfReader(fileName); for (int page = 1; page <= pdfReader.NumberOfPages; page++) { ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy(); string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy); text.Append(currentText); } pdfReader.Close(); } } catch (Exception ex) { MessageBox.Show(ex.Message); } richTextBox1.Text = text.ToString(); } private void Save_TextFile_Click(object sender, EventArgs e) { SaveFileDialog sfd = new SaveFileDialog(); DialogResult messageResult = MessageBox.Show("Save this file into Text?", "Text File", MessageBoxButtons.OKCancel); if (messageResult == DialogResult.Cancel) { } else { sfd.Title = "Save As Textfile"; sfd.InitialDirectory = @"C:\"; sfd.Filter = "TextDocuments|*.txt"; if (sfd.ShowDialog() == DialogResult.OK) { if (richTextBox1.Text != "") { richTextBox1.SaveFile(sfd.FileName, RichTextBoxStreamType.PlainText); richTextBox1.Text = ""; MessageBox.Show("Text Saved Succesfully", "Text File"); } else { MessageBox.Show("Please Upload Your Pdf", "Text File", MessageBoxButtons.OKCancel, MessageBoxIcon.Asterisk); } } } } 
0
source share

All Articles