PDF page counter using regex

I used regex to calculate the number of pages for pdf. Below is the code I used.

Regex regex = new Regex(@"/Type\s*/Page[^s]"); MatchCollection matches = regex.Matches(sr.ReadToEnd()); return matches.Count; 

It works great with version below 1.6, but does not work with PDF files with version 1.6. It returns 0 pages if the pdf version is 1.6.

+2
source share
1 answer

In your case, you most likely should deal with 1.6 documents that use the function of streams of compressed objects introduced then. As in such documents, the information you are looking for is compressed, your regular expression does not find it.

There are tools that let you unpack such streams into a file before searching for it. However, before looking for them, keep in mind that the result of your code cannot be trusted, as

  • there may be more matches than pages, because the file may contain old, unused page objects or even other incorrect positives,
  • there may be fewer matches than pages, because PDF allows alternative ways to write these type records.
+2
source

All Articles