Assuming djvu files contain OCR-ed text, a quick way on Linux to get this is to use Popen to start djvutxt and capture the output.
The text in the .djvu file is compressed using a special djvu compression algorithm, bzz , for which there is no simple C interface that you could load as a shared object in Python. This is a C ++ implementation based on some frameworks.
Shameless self-promotion: I contributed to the Caliber conversion from OCR-ed .djvu , which uses djvutxt in this way. However, it goes back to my pure python decoder (sloooow) implementation if djvutxt not available . Thus, you can use this code if you cannot use djvutxt .
I have not yet released a Python source separate from Caliber. But after loading and extracting the Caliber source:
curl -L http://status.calibre-ebook.com/dist/src | tar xvJ find . | fgrep djvu
Corresponding files: djvu_input.py , djvu.py and djvubzzdec.py
source share