Are there any open source tools or commercial tools available that allow you to index a piece of database content and can be queried with Java?
The background is a large MySQL database table with several hundred thousand records, containing several VARCHAR columns. In these columns, people would like to search for pieces of content, so a full-text index (based on word boundaries) would not help.
EDIT : [Added to clarify why these first sentences do not solve the problem:]
This is why MySQL, built in a full-text index, will not do the job, and neither Lucene nor Sphinx, all of which were not suggested in the answers. I already looked at both, but as far as I can tell, they are based on indexing words, excluding stop words and doing all kinds of reasonable things for real full-text search. However, this does not fit, because I can search for a search term, for example, "oison", which should match "Roisonic Street", as well as "Poison-Ivy". The main difference here is that the search query is just a fragment of the contents of the column , which does not have to be limited to any special characters or space.
EDIT2 : [Additional information added:] The requested function, which should be implemented on the basis of this, is a very free search for descriptions of elements in the goods management system. Users often do not know the correct position number, but only part of the element name. Unfortunately, the quality of these descriptions is rather poor; they come from an inherited system and cannot be easily changed. If, for example, people were looking for a sledgehammer, they would go into the sled. With a word / token-based index, it did not find matches that are stored as a “sledgehammer”, but only those who listen to the “sledgehammer”. There are all kinds of weird deviations that need to be addressed, making the token-based approach impractical.
Currently, we can only make a LIKE '%searchterm%' query LIKE '%searchterm%' , effectively disabling any index usage and consuming a lot of resources and time.
Ideally, any such tool will create an index that would allow me to get results for such queries very quickly, so that I can implement a searchlight-like search only by receiving "real" data from the MySQL table through the primary key when the user selects the result record.
If possible, the index should be updatable (without the need for a complete overhaul), as the data can change and should be immediately searchable by other clients.
I would be happy to receive recommendations and / or receive reports.
EDIT3: commercial solutions found that “just works” Despite the fact that I have many good answers to this question, I would like to point out here that in the end we went with a commercial product called “QuickFind” that was manufactured and sold by the German company "HMB Datentechnik". Please note that I am not affiliated with them in any way, because it may seem like this when I continue and tell you what their product can do. Unfortunately, their site looks pretty bad and only in German, but the product itself is really wonderful. I currently have a trial version - you will have to contact them without downloading - and I am very impressed.
Since the Internet is not complete documentation, I will try to describe my experience so far.
What they do is create a custom index file based on the contents of the database. They can integrate through ODBC, but from what I said, clients rarely do this. Instead - and this is what we are likely to do - you create an export of text (such as CSV) from your main database and pass it to your index. This allows you to be completely independent of the actual structure of the table (or any SQL database in general); in fact, we export data combined from several tables. Indexes can be incrementally updated later on the fly.
Based on the fact that their server (only 250 KB or so, acting as a console application or Windows service) is used to listen for requests on the TCP port. The protocol is text-based and looks a bit "old", but it is simple and works. Basically you just pass in which of the available indexes you want to query, and search terms (fragments), separated by spaces. Three available output formats are available: an array of HTML / JavaScript, XML, or CSV. I am currently working on a Java shell for a somewhat "obsolete" wire protocol. But the results are fantastic: I currently have an approximate data set of approximately 500,000 records with 8 columns indexed, and my test application starts searching all 8 columns for the contents of the JTextField every time I press a key while editing, and can update the display of the results (JTable) in real time! This happens without accessing the MySQL instance from which the data was originally obtained. Based on the columns you will return, you can ask for the “original” record by querying MySQL with the primary key of this row (this, of course, should be included in the QuickFind index).
The index is about 30-40% of the size of the text export version of the data. Indexing was mainly related to disk I / O; my 500,000 records took about a minute or two to be processed.
It’s hard to describe it, because it was even hard for me to believe when I saw a demo version of my own product. They presented a database of addresses of 10 million lines and searched for fragments of names, addresses and phone numbers, and when the "Search" button was clicked, the results returned in a second - everything was done on a laptop! From what I was told, they often integrate with SAP or CRM systems to improve search time when call center agents simply understand fragments of the caller’s names or addresses.
Somehow, I probably won't describe it much better. If you need something like this, you should definitely check it out. Google Translate does a pretty good job translating its site from German to English, so this may be a good start.