Here's the problem I have, I have a set of magazines that can grow pretty quickly. Each day they are divided into separate files, and the files can easily grow to a gigantic size. To keep the size down, records older than 30 days are deleted.
The problem is when I want to search for these files for a specific string. The Boyer Moore search is unreasonably slow right now. I know that applications like dtSearch can provide quick searches using indexing, but I'm not sure how to implement this without taking up twice as much space as the journal already takes.
Are there any resources that I can check that might help? I'm really looking for a standard algorithm that will explain what I have to do to create an index and use it to search.
Edit:
Grep will not work, as this search needs to be integrated into a cross-platform application. I just canβt download it, including any external program.
The way this works is that there is a web interface in which there is a log browser. This is due to the C ++ web server custom backend. This server should look for logs in a reasonable amount of time. Finding multiple journal outlines is currently time consuming.
Edit 2: Some of these suggestions are great, but I have to repeat that I cannot integrate another application, this is part of the contract. But in order to answer some questions, the data in the journals varies from the messages received in a particular medical care format or messages related to them. I am looking to rely on the index, because although it may take up to a minute to restore the index, the search currently takes a very long time (I saw that it takes up to 2.5 minutes). In addition, many data are deleted before recording them. If some debug logging options are not enabled, more than half of the log messages are ignored.
The search basically looks like this: the user in a web form is presented with a list of the most recent messages (streaming from disk when scrolling through them, yay for ajax), as a rule, they will search for messages with some information in it, possibly a patient identifier or some the string they sent, and so they can enter the string into the search. The search is sent asynchronously, and the user web server linearly scans the logs 1 MB at a time for some results. This process can take a very long time when the logs become large. And this is what I am trying to optimize.
algorithm search full-text-search scalability
ReaperUnreal 02 Oct '08 at 18:16 2008-10-02 18:16
source share