If I had to write a program for this, I would find a PDF rendering library capable of extracting text from PDF files, such as Xpdf , and then count the words. If this was one of the tasks or something that needed to be automated for a non-production quality task, I just downloaded the file into pdftotext and then parsed the output file using python, breaking it into words, putting them into the dictionary and counting the number of events.
If I asked this interview question, I would look for a couple of things:
- PDF, , PDF "". , PDF . . , , , . PDF . (pdftotext ).
.