Getting PHP to read .doc files on Linux

I am trying to read a .doc file in a database so that I can index its contents. Is there an easy way for PHP on Linux to read .doc files? Otherwise, is it possible to convert .doc files to rtf, pdf or some other β€œopen” format that is easy to read?

Note that I am not interested in .docx files.

+5
source share
10 answers

There seems to be a library for accessing Word documents , but not sure how to access it with PHP. I think the best solution would be to call them wv command from PHP.

+3
source

, OpenOffice/ . . - , MS doc.

, PDF, :

/usr/lib/ooo-2.0/program/soffice.bin -norestore -nofirststart -nologo -headless -invisible   "macro:///Standard.Module1.SaveAsPDF(demo.doc)"
+7

phpLiveDocx Zend Framework DOC RTF PHP Linux, Windows Mac. , PDF PHP , MS Word Open Office!

. - :

http://www.phplivedocx.org

+2

antiword AbiWord . AbiWord, , , RTF, PDF (, GUI, ).

+1

unoconv Ubuntu. , OpenOffice. exec php .

+1

Microsoft .DOC.

0

PHP, doc2rtf, . RTF , , RTF , .

OpenOffice MS Word File > Save As > RTF.

0

DOC , - PHP- .

RTF , , fopen .

RTF, , DOC.

0
source

You can check out the source code of this article: Reading "clean" text from DOCX and ODT

0
source

After a few days of searching, here is my best solution: http://wvware.sourceforge.net/

Install package

sudo apt-get install wv

Use it in PHP:

$output = str_replace('.doc', '.txt', $filename);
shell_exec('/usr/bin/wvText ' . $filename . ' ' . $output);
$text = file_get_contents($output);
# Convert to UTF-8 if needed
if(!mb_detect_encoding($text, 'UTF-8', true))
{
    $text = utf8_encode($text);
}
unlink($output);
0
source

All Articles