Index Word / PDF Documents from a File System on SQL Server

I am trying to find a simple solution to the problem that I have, because all the ones that I found are too complicated so far!

The situation is that we use a proprietary application to manage most aspects of our business. It has a basic SQL Server 2005 database, which is quite large. The application also allows you to attach Word and PDF documents to records that we use widely, and they are stored in the file system on the server with the file names that are referenced in the database. Unfortunately, the search tools in the application are bad, so I'm trying to create my own version.

So far, I had a neat ASP.NET page with a search field that would allow users to enter search words and filter their results in other areas, such as department, date, etc. Repository The procedure that I wrote in the database looks for the words that they search in several different fields in the database. What I really aimed at is the Google-style "one search for all" style, where the user does not need to indicate where they expect to find the word that they are looking for, they will just get hits anywhere where it appears in the database data. And it works.

What I want to add now is the ability to search to include the text of documents that are “attached” to records. All of them are .doc or .pdf files, but if I could not search for .pdf files, this would not be the end of the world.

In my ideal world, I would find some software that indexes a folder containing documents (currently about 100,000, about 100,000 on average) and populate the table in my existing database with this index so that Then I could include this table in my search. I would like it to simply contain a record for each unique word that it has indexed, and a connection table that refers to documents in the file system containing that word.

Given that this seems bizarre, and there is no software that will do it, or something close to it, as far as I can tell, what solution would you recommend? The server already runs dtSearch, indexing the very files that interest me. However, although I could get through the documentation, trying to figure out how to implement a search for this index through my own web page (which I started to do and find a difficult move), it should be a separate search in one of the SQL database. I could not return the results from the file and database index in a single way.

So, starting with the final desire to have indexed words stored in a database in order to implement full-text search on this, what would anyone suggest?

+4
source share
1 answer

SQL Server has a full text search (http://msdn.microsoft.com/en-us/library/ms142571.aspx); it supports PDF files and words (although with some wrinkles - installation can be a bit complicated). Link to SQL Server 2008 - but this feature was present with SQL Server 2000.

So, super simplified - your solution will require you to upload documents to SQL Server and submit the corrected file to request them using the built-in free text search features.

Keeping file system and document database synchronized can be a problem, but other than that, I think the solution should be pretty simple.

+2
source

All Articles