Import large csv to mysql database

im having a really hard time trying to import a large csv file into mysql on localhost

csv is about 55 MB and has about 750,000 lines.

now ive resorts to writing a script that parses csv and dumps lines 1 on one

the code looks like:

$row = 1; if (($handle = fopen("postal_codes.csv", "r")) !== FALSE) { while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) { $num = count($data); $row++; for ($c=0; $c < $num; $c++) { $arr = explode('|', $data[$c]); $postcode = mysql_real_escape_string($arr[1]); $city_name = mysql_real_escape_string($arr[2]); $city_slug = mysql_real_escape_string(toAscii($city_name)); $prov_name = mysql_real_escape_string($arr[3]); $prov_slug = mysql_real_escape_string(toAscii($prov_name)); $prov_abbr = mysql_real_escape_string($arr[4]); $lat = mysql_real_escape_string($arr[6]); $lng = mysql_real_escape_string($arr[7]); mysql_query("insert into cities (`postcode`, `city_name`, `city_slug`, `prov_name`, `prov_slug`, `prov_abbr`, `lat`, `lng`) values ('$postcode', '$city_name', '$city_slug', '$prov_name', '$prov_slug', '$prov_abbr', '$lat', '$lng')") or die(mysql_error()); } } fclose($handle); } 

the problem is that it will be executed forever ... any solutions would be great.

+8
php mysql csv
source share
7 answers

You reinvent the wheel. Check out the mysqlimport tool that comes with MySQL. It is an effective tool for importing CSV data files.

mysqlimport is the command line interface for the LOAD DATA LOCAL INFILE SQL statement.

Or it should work 10-20x faster than doing INSERT line by line.

+6
source share

Probably your problem is that you have autocommit (by default), so MySQL is executing a new transaction for each insert. You must disable SET autocommit=0; with SET autocommit=0; . If you can switch to using the mysqli library (and you should, if possible), you can use mysqli::autocommit(false) to disable autoupdate.

 $mysqli = new mysqli('localhost','db_user','my_password','mysql'); $mysqli->autocommit(false); $stmt=$mysqli->prepare("insert into cities (`postcode`, `city_name`, `city_slug`, `prov_name`, `prov_slug`, `prov_abbr`, `lat`, `lng`) values (?, ?, ?, ?, ?, ?, ?, ?);") $row = 1; if (($handle = fopen("postal_codes.csv", "r")) !== FALSE) { while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) { $num = count($data); $row++; for ($c=0; $c < $num; $c++) { $arr = explode('|', $data[$c]); $stmt->bind_param('ssssssdd', $arr[1], $arr[2], toAscii(arr[2]), $arr[3], toAscii($arr[3]), $arr[4], $arr[6], $arr[7]); $stmt->execute(); } } } $mysqli->commit(); fclose($handle); 
+2
source share

It will be much faster to use LOAD DATA if you can

+2
source share

try to do this in one request.

It may be limited to your my.cnf (mysql configuration) though

 <?php $row = 1; $query = ("insert into cities "); if (($handle = fopen("postal_codes.csv", "r")) !== FALSE) { while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) { $num = count($data); $row++; for ($c=0; $c < $num; $c++) { $arr = explode('|', $data[$c]); $postcode = mysql_real_escape_string($arr[1]); $city_name = mysql_real_escape_string($arr[2]); $city_slug = mysql_real_escape_string(toAscii($city_name)); $prov_name = mysql_real_escape_string($arr[3]); $prov_slug = mysql_real_escape_string(toAscii($prov_name)); $prov_abbr = mysql_real_escape_string($arr[4]); $lat = mysql_real_escape_string($arr[6]); $lng = mysql_real_escape_string($arr[7]); $query .= "(`postcode`, `city_name`, `city_slug`, `prov_name`, `prov_slug`, `prov_abbr`, `lat`, `lng`) values ('$postcode', '$city_name', '$city_slug', '$prov_name', '$prov_slug', '$prov_abbr', '$lat', '$lng'),"; } } fclose($handle); } mysql_query(rtrim($query, ",")); 

if it does not work, you can try this (disable automatic commit)

 mysql_query("SET autocommit = 0"); $row = 1; if (($handle = fopen("postal_codes.csv", "r")) !== FALSE) { while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) { $num = count($data); $row++; for ($c=0; $c < $num; $c++) { $arr = explode('|', $data[$c]); $postcode = mysql_real_escape_string($arr[1]); $city_name = mysql_real_escape_string($arr[2]); $city_slug = mysql_real_escape_string(toAscii($city_name)); $prov_name = mysql_real_escape_string($arr[3]); $prov_slug = mysql_real_escape_string(toAscii($prov_name)); $prov_abbr = mysql_real_escape_string($arr[4]); $lat = mysql_real_escape_string($arr[6]); $lng = mysql_real_escape_string($arr[7]); mysql_query("insert into cities (`postcode`, `city_name`, `city_slug`, `prov_name`, `prov_slug`, `prov_abbr`, `lat`, `lng`) values ('$postcode', '$city_name', '$city_slug', '$prov_name', '$prov_slug', '$prov_abbr', '$lat', '$lng')") or die(mysql_error()); } } fclose($handle); } 
+1
source share

I did this with the SQL server:

  • I used the SQL Bulkinsert command in conjunction with data tables.
  • Data tables are stored in memory and are based on reading lines within a file.
  • Each data table is built from a piece of rows, not an entire file.
  • Keep track of fragment processing, keeping pointers from the last line and maximum fragment size.
  • When you read the file. exit the loop when row id> last row + block size.
  • Saving the loop and continuing insertion.
+1
source share

Also sometimes, when you use Load data, if there are warnings, the import will stop. You can use the ignore keyword.

 LOAD DATA INFILE 'file Path' IGNORE INTO TABLE YOUR_Table 
0
source share

I had a similar situation when it was not possible to use LOAD DATA. Transactions were sometimes unacceptable, as the data needed to be checked for duplicates. However, the following greatly improved the processing time for some of my import data files.

Before the while loop (CSV Lines) sets autocommit to 0 and starts the transaction (InnoDB only):

 mysql_query('SET autocommit=0;'); mysql_query('START TRANSACTION;'); 

After your loop, commit and reset autocommit, return to 1 (default):

 mysql_query('COMMIT;'); mysql_query('SET autocommit=1;'); 

Replace mysql_query () with any database object that uses your code. Hope this helps others.

0
source share

All Articles