im is working on a PHP + MySQL application that will bypass the hard drive / shared drive and index all files and directories into the database to provide a full-text search on it. So far, they’re fine, but I’m stuck on the question if I had studied correctly how to store data in a database.
In the figure below you can see the details diagram of my database. The idea is that I save the domain (which represents the part of the disk that I want to index), then there are several links (which represent the files and folder (with the contents, filepath, etc.), then I have a table for storage soles (uniq) of keywords that I find in a file name or folder or content.
And finally, I have 16 linkkeyword tables to keep the relationship between links and keywords . I have 16 of them, because I thought it would be nice to do something like a hash table, because I expected a lot of relationships between the link ↔ keyword . (so far for 15k links and 400k keywords I have about 2.5 million linkkeyword entries). Therefore, in order not to store so much data in one table (and then look for them higher), I thought that this hastable could be faster. It works like I wana to search for a word, I compute it md5 and look at the first character of md5, and then I know which linkkeyword table to use. Thus, in each linkkeyword table there are only about 150 ~ 200 thousand records (against 2.5 million)

So they are curious if this approach can be useful or it is better to store all the linkkeyword information in a separate table and mysql will take care of this (and how much link ↔ keyword can work?)
So far, this has been a great solution for me, but I got really crashed when I tried to implement a regex search. Thus, the user can use, for example. "tem *", which can lead to a temporary, temporary, temple, etc. In normal mode, when searching for a word, I will settle for the md5 hash, and then I know which linkkeyword table I need to look at, but for regular expression I need to get all the keywords from the table keywords (which corresponds to the regular expression), and then process them one at a time.
Im also attaching a piece of code for a simple keyword search
private function searchKeywords($selectedDomains) { $searchValues = $this->searchValue; $this->resultData = array(); foreach (explode(" ", $searchValues) as $keywordName) { $keywordName = strtolower($keywordName); $keywordMd5 = md5($keywordName); $selection = $this->database->table('link'); $results = $selection->where('domain.id', $selectedDomains)->where('domain.searchable = ?', '1')->where(':linkkeyword' . $keywordMd5[0] . '.keyword.keyword LIKE ?', $keywordName) ->select('link.*,:linkkeyword' . $keywordMd5[0] . '.weight,:linkkeyword' . $keywordMd5[0] . '.keyword.keyword'); foreach ($results as $result) { $keyExists = array_key_exists($result->linkId, $this->resultData); if ($keyExists) { $this->resultData[$result->linkId]->updateWeight($result->weight); $this->resultData[$result->linkId]->addKeyword($result->keyword); } else { $domain = $result->ref('domain'); $linkClass = new search\linkClass($result, $domain); $linkClass->updateWeight($result->weight); $linkClass->addKeyword($result->keyword); $this->resultData[$result->linkId] = $linkClass; } } } }
and regular expression search function
private function searchRegexp($selectedDomains) { //get stored search value $searchValues = $this->searchValue; //replace astering and exclamation mark (counted as characters for regular expression) and replace them by their mysql equivalent $searchValues = str_replace("*", "%", $searchValues); $searchValues = str_replace("!", "_", $searchValues); // empty result array to prevent previous results to interfere $this->resultData = array(); //searched phrase can be multiple keywords, so split it by space and get results for each keyword foreach (explode(" ", $searchValues) as $keywordName) { //set default link result weight to -1 (default value) $weight = -1; //select all keywords, which match searched keyword (or its regular expression) $keywords = $this->database->table('keyword')->where('keyword LIKE ?', $keywordName); foreach ($keywords as $keyword) { //count keyword md5 sum to determine which table should be use to match it links $md5 = md5($keyword->keyword); //get all link ids from linkkeyword relation table $keywordJoinLink = $keyword->related('linkkeyword' . $md5[0])->where('link.domain.searchable','1'); //loop found links foreach ($keywordJoinLink as $link) { //store link weight, for later result sort $weight = $link->weight; //get link ID $linkId = $link->linkId; //check if link already exists in results, to prevent duplicity $keyExists = array_key_exists($linkId, $this->resultData); //if link already exists in result set, just update its weight and insert matching keyword for later keyword tag specification if ($keyExists) { $this->resultData[$linkId]->updateWeight($weight); $this->resultData[$linkId]->addKeyword($keyword->keyword); //if link isnt in result yet, insert it } else { //get link reference $linkData = $link->ref('link', 'linkId'); //get information about domain, to which link belongs (location, flagPath,...) $domainData = $linkData->ref('domain', 'domainId'); //if is domain searchable and was selected before search, add link to result set. Otherwise ignore it if ($domainData->searchable == 1 && in_array($domainData->id, $selectedDomains)) { //create new link instance $linkClass = new search\linkClass($linkData, $domainData); //insert matching keyword to links keyword set $linkClass->addKeyword($keyword->keyword); //set links weight $linkClass->updateWeight($weight); //insert link into result set $this->resultData[$linkId] = $linkClass; } } } } } }