Search for a specific Word in PDF using Itextsharp

This is my first post on StackOverflow.

I have a PDF file on my system drive ... I want to write a program in C # using the Itextsharp.dll link to find a specific word in this PDF file ... let's say I want to search for "StackOverFlow" .. If the PDF contains the word "StackOverFlow", it should return true.

Otherwise, it should return false.

I have looked through many articles, but still have not received a solution. :-(

I have tried so far:

public string ReadPdfFile(string fileName) { StringBuilder text = new StringBuilder(); if (File.Exists(fileName)) { PdfReader pdfReader = new PdfReader(fileName); for (int page = 1; page <= pdfReader.NumberOfPages; page++) { ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy(); string currentText = "2154/MUM/2012 A";// PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy); currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText))); text.Append(currentText); } pdfReader.Close(); } return text.ToString(); } 

Thanks in advance, Sabya Dev

+8
c # pdf itextsharp
source share
1 answer

The following method works fine. It gives a list of pages in which text is found.

  public List<int> ReadPdfFile(string fileName, String searthText) { List<int> pages = new List<int>(); if (File.Exists(fileName)) { PdfReader pdfReader = new PdfReader(fileName); for (int page = 1; page <= pdfReader.NumberOfPages; page++) { ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy(); string currentPageText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy); if (currentPageText.Contains(searthText)) { pages.Add(page); } } pdfReader.Close(); } return pages; } 
+15
source share

All Articles