Parser Search Engine Keywords

Here is what I want to do:

I need to create a search engine parser that uses the following operators:

  • Apples And Oranges (AND operator)
  • Apples OR Oranges (OR operator)
  • Apples AND NOT Oranges (AND NOT operator)
  • Apples (quote operator)
  • Apples AND ( Oranges OR Pears ) (Bracket Operator)
  • Appl * (Star operator)

With some preg_replace, I manage to convert a string to an array, and then I parse this array to get a MySQL query. But I do not like it, and it is very unstable!

I was looking for a website for some script that does this, and I was out of luck!

Can someone please help me implement this?

thanks

+8
operators php search-engine text-parsing
source share
5 answers

Ok, that will be a great answer.

I think you need a parser generator. The piece of software that generates code for analyzing text in accordance with this grammar. These parsers often have 2 main components: lexer and parser. The lexer identifies the TOKENS (words), the parser checks the correct order of the marker according to your grammar.

In the lexer, you must declare the following tokens

TOKENS ::= (AND, OR, NOT, WORD, WORDSTAR, LPAREN, RPAREN, QUOTE) WORD ::= '/w+/' WORDSTAR ::= '/w+\*/' 

Grammar should be defined as follows:

 QUERY ::= word QUERY ::= wordstar QUERY ::= lparen QUERY rparen QUERY ::= QUERY and QUERY QUERY ::= QUERY or QUERY QUERY ::= QUERY and not QUERY QUERY ::= quote MQUERY quote MQUERY ::= word MQUERY MQUERY ::= word 

This grammar defines a language with all the functions you need. Depending on the software you use, you can define functions for processing each rule. That way you can convert your text query into a sql where clause.

I am not in php, but I searched the Internet for a parser generator and PHP_ParserGenerator .

Keep in mind that as your database grows, these queries can be a problem for structured storage.

You can try the full-text search engine, which allows you to perform this and many other functions related to text search. Here's how IndexTank works

First you add (or "index" to the search dialect) all your db entries (or documents) in IndexTank.

 $api = new ApiClient(...); $index = $api->get_index('my_index'); foreach ($dbRows as $row) { $index->add_document($row->id, array('text' => $row->text)); } 

After that you can search in the index with all the operators you want

 $index = $api->get_index('my_index'); $search_result = $index->search('Apples AND Oranges'); $search_result = $index->search('Apples OR Oranges'); $search_result = $index->search('Apples AND NOT Oranges'); $search_result = $index->search('"apples oranges"'); $search_result = $index->search('Apples AND ( Oranges OR Pears )'); $search_result = $index->search('Appl*'); 

I hope I answered your question.

+3
source share
+1
source share

You watched ANTLR

0
source share

You could create a homepage ( IMPORTANT: $search string must be processed first or you will be hacked) ...

 if (substr($search[0]=='*' and substr($search,-1)=='*') { // *ppl* $query = "SELECT * FROM `table` WHERE `field` LIKE (%'". str_replace('*','',$search) ."%')"; } elseif (substr($search,-1)=='*') { // Appl* $query = "SELECT * FROM `table` WHERE `field` LIKE ('". str_replace('*','',$search) ."%')"; } elseif ($search[0]=='*') { // *Appl $query = "SELECT * FROM `table` WHERE `field` LIKE ('%". str_replace('*','',$search) ."')"; } elseif (substr_count($search,'"')==2) { // " Apples " ... just remove the " $query = 'SELECT * FROM `table` WHERE `field` = "'. str_replace('"','',$search) .'"'; } elseif (strpos($search,')') or strpos($search,'(')) { // uh ... something more complex here $query = '#idunno'; } else { // the rest $query = 'SELECT * FROM `table` WHERE `field` = "'. $search .'"'; $search = array( ' AND ', ' OR ', ' AND NOT ' ); $replace = array( '" AND `field` = "', '" OR `field` = "', '" AND `field != "' ); str_replace($search,$replace,$query); } 
0
source share

Try the following: http://www.isearchthenet.com/isearch/index.php

From the readme file:

  • Searches are usually performed with "may contain" words. To match, you must have any of the words entered on the page.
  • You can search for pages containing a specific word by first adding a plus sign (+). Only pages containing this word are displayed.
  • You can ignore all pages containing a specific word prefixed with a minus sign (-). Any page containing this word will not appear in search results.
  • You can search for a specific phrase by enclosing it in double quotation marks ("). Only pages that contain this exact phrase will be displayed.

Easy to install and use. Also check out http://sphinxsearch.com/ - the most powerful engine, but not for beginners.

-one
source share

All Articles