Reading doc and docx files using C # without installing MS Office on the server

I am working on a project (asp.net, C #, vb 2010, .net 4) and I need to read the DOC and DOCX files that I preloaded (I did the download). The hard part is that I do not have MS Office installed on the server and I cannot use it.

Is there any public library that I can include in my project without having to install anything? Both documents are very simple:

NUMBER TAB STRING NUMBER TAB STRING NUMBER TAB STRING ... 

I need to extract the number and line for each line (paragraph).

Can anyone help with this? I have to reiterate that I am limited in such a way that I cannot install anything on the server.

+6
source share
3 answers

For DOC, you can use the open source NPOI library .

For DOCX, I offer the Open XML API

+4
source

you can use Code7248.word_reader.dll

the following is sample code on how to use Code7248.word_reader.dll

add a link to this DLL in your project and copy the code below.

 using System; using System.Collections.Generic; using System.Text; //add extra namespaces using Code7248.word_reader; namespace testWordRead { class Program { private void readFileContent(string path) { TextExtractor extractor = new TextExtractor(path); string text = extractor.ExtractText(); Console.WriteLine(text); } static void Main(string[] args) { Program cs = new Program(); string path = "D:\Test\testdoc1.docx"; cs.readFileContent(path); Console.ReadLine(); } } } 
+2
source

Update: NPOI now supports docx. Try the latest version (beta version of NPOI 2.0)

+1
source

All Articles