Should I keep the keyword link & # 8596; in "hastable" as a structure in mysql?

im is working on a PHP + MySQL application that will bypass the hard drive / shared drive and index all files and directories into the database to provide a full-text search on it. So far, they’re fine, but I’m stuck on the question if I had studied correctly how to store data in a database.

In the figure below you can see the details diagram of my database. The idea is that I save the domain (which represents the part of the disk that I want to index), then there are several links (which represent the files and folder (with the contents, filepath, etc.), then I have a table for storage soles (uniq) of keywords that I find in a file name or folder or content.

And finally, I have 16 linkkeyword tables to keep the relationship between links and keywords . I have 16 of them, because I thought it would be nice to do something like a hash table, because I expected a lot of relationships between the linkkeyword . (so far for 15k links and 400k keywords I have about 2.5 million linkkeyword entries). Therefore, in order not to store so much data in one table (and then look for them higher), I thought that this hastable could be faster. It works like I wana to search for a word, I compute it md5 and look at the first character of md5, and then I know which linkkeyword table to use. Thus, in each linkkeyword table there are only about 150 ~ 200 thousand records (against 2.5 million)

Database schema

So they are curious if this approach can be useful or it is better to store all the linkkeyword information in a separate table and mysql will take care of this (and how much linkkeyword can work?)

So far, this has been a great solution for me, but I got really crashed when I tried to implement a regex search. Thus, the user can use, for example. "tem *", which can lead to a temporary, temporary, temple, etc. In normal mode, when searching for a word, I will settle for the md5 hash, and then I know which linkkeyword table I need to look at, but for regular expression I need to get all the keywords from the table keywords (which corresponds to the regular expression), and then process them one at a time.

Im also attaching a piece of code for a simple keyword search

private function searchKeywords($selectedDomains) { $searchValues = $this->searchValue; $this->resultData = array(); foreach (explode(" ", $searchValues) as $keywordName) { $keywordName = strtolower($keywordName); $keywordMd5 = md5($keywordName); $selection = $this->database->table('link'); $results = $selection->where('domain.id', $selectedDomains)->where('domain.searchable = ?', '1')->where(':linkkeyword' . $keywordMd5[0] . '.keyword.keyword LIKE ?', $keywordName) ->select('link.*,:linkkeyword' . $keywordMd5[0] . '.weight,:linkkeyword' . $keywordMd5[0] . '.keyword.keyword'); foreach ($results as $result) { $keyExists = array_key_exists($result->linkId, $this->resultData); if ($keyExists) { $this->resultData[$result->linkId]->updateWeight($result->weight); $this->resultData[$result->linkId]->addKeyword($result->keyword); } else { $domain = $result->ref('domain'); $linkClass = new search\linkClass($result, $domain); $linkClass->updateWeight($result->weight); $linkClass->addKeyword($result->keyword); $this->resultData[$result->linkId] = $linkClass; } } } } 

and regular expression search function

 private function searchRegexp($selectedDomains) { //get stored search value $searchValues = $this->searchValue; //replace astering and exclamation mark (counted as characters for regular expression) and replace them by their mysql equivalent $searchValues = str_replace("*", "%", $searchValues); $searchValues = str_replace("!", "_", $searchValues); // empty result array to prevent previous results to interfere $this->resultData = array(); //searched phrase can be multiple keywords, so split it by space and get results for each keyword foreach (explode(" ", $searchValues) as $keywordName) { //set default link result weight to -1 (default value) $weight = -1; //select all keywords, which match searched keyword (or its regular expression) $keywords = $this->database->table('keyword')->where('keyword LIKE ?', $keywordName); foreach ($keywords as $keyword) { //count keyword md5 sum to determine which table should be use to match it links $md5 = md5($keyword->keyword); //get all link ids from linkkeyword relation table $keywordJoinLink = $keyword->related('linkkeyword' . $md5[0])->where('link.domain.searchable','1'); //loop found links foreach ($keywordJoinLink as $link) { //store link weight, for later result sort $weight = $link->weight; //get link ID $linkId = $link->linkId; //check if link already exists in results, to prevent duplicity $keyExists = array_key_exists($linkId, $this->resultData); //if link already exists in result set, just update its weight and insert matching keyword for later keyword tag specification if ($keyExists) { $this->resultData[$linkId]->updateWeight($weight); $this->resultData[$linkId]->addKeyword($keyword->keyword); //if link isnt in result yet, insert it } else { //get link reference $linkData = $link->ref('link', 'linkId'); //get information about domain, to which link belongs (location, flagPath,...) $domainData = $linkData->ref('domain', 'domainId'); //if is domain searchable and was selected before search, add link to result set. Otherwise ignore it if ($domainData->searchable == 1 && in_array($domainData->id, $selectedDomains)) { //create new link instance $linkClass = new search\linkClass($linkData, $domainData); //insert matching keyword to links keyword set $linkClass->addKeyword($keyword->keyword); //set links weight $linkClass->updateWeight($weight); //insert link into result set $this->resultData[$linkId] = $linkClass; } } } } } } 
+6
source share
1 answer

Your question is mainly one of the opinions, so you can include criteria that will allow us to more “objectively” answer “worth it”.

It looks like you have reinvented the concept of outlining a database (although without distributing your data to multiple servers).

I assume that you are trying to optimize your search time; if that happens, I would suggest that 2.5 million records on modern hardware are not a particularly big performance issue if your queries can use an index. If you cannot use an index (for example, because you are performing a regular expression search), the fragments will probably not help at all.

My general recommendation with tuning database performance is to start with the simplest possible relational solution, continue tuning until it violates your performance goals, and then add more hardware, and only after you do it done, you have to go for "exotic" solutions, such as shards.

This does not mean using prayer as a strategy. For a high-performance application, I usually create a test database where I can experiment with solutions. In your case, I would build a database with your schema without sharding tables and then populate it with test data (either write my own population routines or use a tool like DBMonster ). As a rule, I would like to at least double the size that I expect in production. You can then run and tune queries to prove, one way or another, how good your schema is. It sounds like a lot of work, but it is much less than your scalding solution, which is likely to bring.

There are (like @danFromGermany comments) solutions optimized for text serach, and you can use MySQL full-text search rather than regular expressions.

+4
source

All Articles