Php & mysql: the most efficient method for checking a large array against a database

I have a large set of data stored in a multidimensional array. An example structure is as follows:

Array ( [1] => Array ( [0] => motomummy.com [1] => 1921 [2] => 473 ) [4] => Array ( [0] => kneedraggers.com [1] => 3051 [2] => 5067 ) ) 

I also have a table in the mysql database that currently contains ~ 80K domain names. This list will grow monthly, possibly with ~ 10K + domain names. The goal is to compare the Array [] [0] (domain name) to the mysql database and return an array with the stored values (but not saving key) that contains only unique values.

Note that I want to compare only the first index, NOT the entire array.

The initial multidimensional array is considered huge in size (more than likely from 100,000 to 10 million results). What is the best way to return data that is not contained in the database?

Now I just store the array, a complete list of domains from the database, and then using the following function, comparing each value in the original array with the database array. This is terribly slow and inefficient, obviously.

 // get result of custom comparison function $clean = array_filter($INITIAL_LIST, function($elem) { $wordOkay = true; // check every word in "filter from database" list, store it only if not in list foreach ($this->domains as $domain) { if (stripos($elem[0], $domain) !== false) { $wordOkay = false; break; } } return $wordOkay; }); 

Some pseudo codes or even actual code will be very helpful at this point.

+4
source share
1 answer

Use a DBMS! It was made for such things.

  • Create a temporary table temp {id (fill with array index); url (filled URL)}

  • Fill it with array data

  • Ideally create an index on temp.url

  • Request a database:

     SELECT * FROM `temp` LEFT JOIN `urls` WHERE urls.url = temp.url AND urls.url IS NULL; 

    ( urls table is your existing data)

+2
source

All Articles