Need a time-efficient method to import a large CSV file via PHP into multiple MySQL tables

Well, I have serious problems here. I am new to this site and new to importing CSV data via PHP, but I am not new to programming.

I am currently working on creating a customer relationship manager. I need to create a script to import a file that will populate the database with outputs. The main issue here is that the leading data consists of companies and employees of the specified company. In addition, several other tables, such as billing information, are split from other tables.

I have a working script that will allow users to map imported data to specific rows and columns.

function mapData($file) { // Open the Text File $fd = fopen($file, "r"); // Return FALSE if file not found if(!$fd) { return FALSE; } // Get the First Two Lines $first = 0; $data = array(); while(!feof($fd)) { if($first == 0) { $cols = fgetcsv($fd, 4096); $data['cols'] = array(); if(is_array($cols) && count($cols)) { foreach($cols as $col) { if(!$col) { continue; } $data['cols'][] = $col; } } if(empty($data['cols'])) { return array(); } $first++; continue; } else { $data['first'] = fgetcsv($fd, 4096); break; } } fclose($fd); // Return Data return $data; } 

The above script is activated only after CodeIgniter moves the file to the working directory. I already know what the file name is. The file is included and returns a list of columns and the first row. Any empty columns are ignored.

After that, the process proceeds to display the script. As soon as the comparison is performed and "Import" is clicked, this piece of code is loaded.

 function importLeads($file, $map) { // Open the Text File if(!file_exists($file)) { return false; } error_reporting(E_ALL); set_time_limit(240); ini_set("memory_limit", "512M"); $fd = fopen($file, "r"); // Return FALSE if file not found if(!$fd) { return FALSE; } // Traverse Each Line of the File $true = false; $first = 0; while(!feof($fd)) { if($first == 0) { $cols = fgetcsv($fd); $first++; continue; } // Get the columns of each line $row = fgetcsv($fd); // Traverse columns $group = array(); $lead_status = array(); $lead_type = array(); $lead_source = array(); $user = array(); $user_cstm = array(); $user_prof = array(); $acct = array(); $acct_cstm = array(); $acct_prof = array(); $acct_group = array(); if(!$row) { continue; } foreach($row as $num => $val) { if(empty($map[$num])) { continue; } $val = str_replace('"', "&#34;", $val); $val = str_replace("'", "&#39;", $val); switch($map[$num]) { // Company Account case "company_name": $acct['company_name'] = $val; break; case "lead_type": $lead_type['name'] = $val; break; case "lead_source": $lead_source['name'] = $val; break; case "lead_source_description": $lead_source['name'] = $val; break; case "campaign": $campaign['name'] = $val; break; case "mcn": $acct['mcn'] = $val; break; case "usdot": $acct['usdot'] = $val; break; case "sic_codes": $acct_cstm['sic_codes'] = $val; break; case "naics_codes": $acct_cstm['naics_codes'] = $val; break; case "agent_assigned": $acct_cstm['agent_assigned'] = $val; break; case "group_assigned": $group['name'] = $val; break; case "rating": $acct_cstm['rating'] = $val; break; case "main_phone": $acct['phone'] = $val; break; case "billing_phone": $acct_cstm['billing_phone'] = $val; break; case "company_fax": $acct['fax'] = $val; break; case "company_email": $acct['email2'] = $val; break; // Company Location case "primary_address": $acct['address'] = $val; break; case "primary_address2": $acct['address2'] = $val; break; case "primary_city": $acct['city'] = $val; break; case "primary_state": $acct['state'] = $val; break; case "primary_zip": $acct['zip'] = $val; break; case "primary_country": $acct['country'] = $val; break; case "billing_address": $billing['address'] = $val; break; case "billing_address2": $billing['address2'] = $val; break; case "billing_city": $billing['city'] = $val; break; case "billing_state": $billing['state'] = $val; break; case "billing_zip": $billing['zip'] = $val; break; case "billing_country": $billing['country'] = $val; break; case "company_website": $acct_cstm['website'] = $val; break; case "company_revenue": $acct_cstm['revenue'] = $val; break; case "company_about": $acct_prof['aboutus'] = $val; break; // Misc. Company Data case "bols_per_mo": $acct_cstm['approx_bols_per_mo'] = $val; break; case "no_employees": $acct_cstm['no_employees'] = $val; break; case "no_drivers": $acct_prof['drivers'] = $val; break; case "no_trucks": $acct_prof['power_units'] = $val; break; case "no_trailers": $acct_cstm['no_trailers'] = $acct_prof['trailers'] = $val; break; case "no_parcels_day": $acct_cstm['no_parcels_day'] = $val; break; case "no_shipping_locations": $acct_cstm['no_shipping_locations'] = $val; break; case "approves_inbound": $acct_cstm['approves_inbound'] = $val; break; case "what_erp_used": $acct_cstm['what_erp_used'] = $val; break; case "birddog": $acct_cstm['birddog_referral'] = $val; break; case "status_notes": $acct_cstm['status_notes'] = $val; break; case "notes": $acct_cstm['notes'] = $val; break; case "internal_notes": $acct_cstm['notes_internal'] = $val; break; // User Data case "salutation": $user_cstm['salutation'] = $val; break; case "first_name": $user['first_name'] = $billing['first_name'] = $val; break; case "last_name": $user['last_name'] = $billing['last_name'] = $val; break; case "user_title": $user_prof['title'] = $val; break; case "user_about": $user_prof['about'] = $val; break; case "user_email": $user['email'] = $val; break; case "home_phone": $user_prof['phone'] = $val; break; case "mobile_phone": $user_cstm['mobile_phone'] = $val; break; case "direct_phone": $user_cstm['direct_phone'] = $val; break; case "user_fax": $user_prof['fax'] = $val; break; case "user_locale": $user['location'] = $val; break; case "user_website": $user_prof['website_url'] = $val; break; case "user_facebook": $user_prof['fb_url'] = $val; break; case "user_twitter": $user_prof['twitter_url'] = $val; break; case "user_linkedin": $user_prof['linkedin_url'] = $val; break; } } if(empty($acct['company_name']) || empty($user['first_name']) || empty($user['last_name'])) { continue; } $this->db = $this->load->database('crm_db', TRUE); if(isset($lead_type['name']) && ($name = $lead_type['name'])) { $count = $this->db->count_all("lead_types"); $check = $this->db->get_where("lead_types", array("name" => $name)); if($check->num_rows() < 1) { $this->db->insert("lead_types", array("name" => $name, "order" => $count)); $ltype = $this->db->insert_id(); $acct_cstm['lead_type'] = $acct['account_type'] = $user['company_type'] = $ltype; } } if(isset($lead_source['name']) && ($name = $lead_source['name'])) { $count = $this->db->count_all("lead_sources"); $check = $this->db->get_where("lead_sources", array("name" => $name)); if($check->num_rows() < 1) { $this->db->insert("lead_sources", array("name" => $name, "order" => $count)); $acct_cstm['lead_source'] = $this->db->insert_id(); } } if(isset($campaign['name']) && ($name = $campaign['name'])) { $check = $this->db->get_where("campaigns", array("name" => $name)); if($check->num_rows() < 1) { $campaign['id'] = $accounts_cstm['campaign'] = $this->Secure_m->generate_sugar_id(); $campaign['date_entered'] = time(); $campaign['date_modified'] = time(); $campaign['modified_user_id'] = $this->session->userdata('id'); $campaign['created_by'] = $this->session->userdata('id'); $this->db->insert("campaigns", $campaign); } } if(isset($group['name']) && ($name = $group['name'])) { $order = $this->db->count_all("groups"); $check = $this->db->get_where("groups", array("name" => $name)); if($check->num_rows() < 1) { $this->db->insert("groups", array("name" => $name, "order" => $order)); $acct_group['id'] = $this->db->insert_id(); } } $mem = new stdclass; $uid = 0; if(is_array($user) && count($user)) { $where = ""; if(!empty($user['phone'])) { $where .= "prof.phone = '{$user['phone']}' OR "; $where .= "cstm.mobile_phone = '{$user['phone']}' OR "; $where .= "cstm.direct_phone = '{$user['phone']}'"; } if(!empty($user['mobile_phone'])) { if($where) { $where .= " OR "; } $where .= "prof.phone = '{$user['mobile_phone']}' OR "; $where .= "cstm.mobile_phone = '{$user['mobile_phone']}' OR "; $where .= "cstm.direct_phone = '{$user['mobile_phone']}'"; } if(!empty($user['direct_phone'])) { if($where) { $where .= " OR "; } $where .= "prof.phone = '{$user['direct_phone']}' OR "; $where .= "cstm.mobile_phone = '{$user['direct_phone']}' OR "; $where .= "cstm.direct_phone = '{$user['direct_phone']}'"; } $query = $this->db->query($this->Account_m->userQuery($where)); $mem = reset($query->result()); if($where && !empty($mem->id)) { $uid = $mem->id; $new = array(); foreach($user as $k => $v) { if(!empty($mem->$k)) { $new[$k] = $mem->$k; unset($user[$k]); } else { $new[$k] = $v; } } //$this->db->update("leads", $user, array("id" => $uid)); $user = $new; } else { $user['uxtime'] = time(); $user['isclient'] = 0; $user['flag'] = 0; $user['activation_code'] = $this->Secure_m->generate_activate_id(); $uid = $this->Secure_m->generate_activate_id(10); $query = $this->db->get_where("leads", array("id" => $uid), 1); $data = reset($query->result()); while(!empty($data->id)) { $uid = $this->Secure_m->generate_activate_id(10); $query = $this->db->get_where("leads", array("id" => $uid), 1); $data = reset($query->result()); } $user['id'] = $uid; $this->db->insert("leads", $user); } } if($uid && is_array($user_prof) && count($user_prof)) { if(!empty($mem->uid)) { $new = array(); foreach($user_prof as $k => $v) { if(!empty($mem->$k)) { $new[$k] = $mem->$k; unset($user_prof[$k]); } else { $new[$k] = $v; } } //$this->db->update("mprofiles", $user_prof, array("uid" => $uid)); $user_prof = $new; } else { $user_prof['uid'] = $uid; $user_prof['flag'] = 0; $this->db->insert("ldetails", $user_prof); } } if($uid && is_array($user_cstm) && count($user_cstm)) { $query = $this->db->get_where("leads_cstm", array("crm_id" => $cid), 1); $data = reset($query->result()); if(!empty($data->crm_id)) { $new = array(); foreach($user_cstm as $k => $v) { if(!empty($mem->$k)) { $new[$k] = $mem->$k; unset($user_cstm[$k]); } else { $new[$k] = $v; } } //$this->db->update("leads_cstm", $acct_prof, array("fa_user_id" => $cid)); $user_cstm = $new; } else { $user_cstm['crm_id'] = $uid; $user_cstm['date_entered'] = time(); $user_cstm['date_modified'] = time(); $user_cstm['created_by'] = $this->session->userdata('id'); $user_cstm['modified_user_id'] = $this->session->userdata('id'); $this->db->insert("leads_cstm", $user_cstm); } } $cmp = new stdclass; $cid = 0; if(is_array($acct) && count($acct)) { $acct['uid'] = $uid; $acct['main_contact'] = "{$user['first_name']} {$user['last_name']}"; if(!empty($user['email'])) { $acct['email'] = $user['email']; } $acct['isprospect'] = 0; $acct['flag'] = 0; if(!empty($acct['mcn'])) { $where .= "fms.mcn = '{$acct['mcn']}'"; } if(!empty($acct['phone'])) { if($where) { $where .= " OR "; } $where .= "fms.phone = '{$acct['phone']}' OR "; $where .= "crm.billing_phone = '{$acct['phone']}'"; } if(!empty($acct['billing_phone'])) { if($where) { $where .= " OR "; } $where .= "fms.phone = '{$acct['billing_phone']}' OR "; $where .= "crm.billing_phone = '{$acct['billing_phone']}'"; } if(!empty($acct['company_name'])) { if($where) { $where .= " OR "; } $where .= "fms.company_name = '{$acct['company_name']}'"; } $query = $this->db->query($this->Account_m->acctQuery($where)); $cmp = reset($query->result()); if($where && !empty($cmp->id)) { $cid = $cmp->id; $new = array(); foreach($acct as $k => $v) { if(!empty($cmp->$k)) { $new[$k] = $cmp->$k; unset($acct[$k]); } else { $new[$k] = $v; } } //$this->db->update("accounts", $billing, array("cid" => $cid)); $acct = $new; } else { $cid = $this->Secure_m->generate_activate_id(10); $query = $this->db->get_where("leads", array("id" => $uid), 1); $data = reset($query->result()); while(!empty($data->id)) { $cid = $this->Secure_m->generate_activate_id(10); $query = $this->db->get_where("accounts", array("id" => $cid), 1); $data = reset($query->result()); } $acct['id'] = $cid; $this->db->insert("accounts", $acct); } } if($cid && is_array($acct_group) && count($acct_group)) { $grp = $this->db->get_where("accounts_groups", array("cid" => $cid, "gid" => $acct_group['id'])); if(empty($cmp->id)) { $acct_group['cid'] = $cid; $this->db->insert("accounts_groups", $acct_group); } } if($cid && is_array($acct_prof) && count($acct_prof)) { if(!empty($cmp->id)) { $new = array(); foreach($acct_prof as $k => $v) { if(!empty($cmp->$k)) { $new[$k] = $cmp->$k; unset($acct_prof[$k]); } else { $new[$k] = $v; } } //$this->db->update("cprofiles", $acct_prof, array("cid" => $cid)); $acct_prof = $new; } else { $acct_prof['cid'] = $cid; $acct_prof['flag'] = 0; $this->db->insert("adetails", $acct_prof); } } if($cid && is_array($billing) && count($billing)) { $bill = $this->db->get_where("accounts_billing", array("cid" => $cid)); if(!empty($bill->id)) { $new = array(); foreach($acct_prof as $k => $v) { if(!empty($cmp->$k)) { $new[$k] = $cmp->$k; unset($acct_prof[$k]); } else { $new[$k] = $v; } } //$this->db->update("accounts_billing", $billing, array("cid" => $cid)); } else { $billing['cid'] = $cid; $billing['flag'] = 0; $this->db->insert("accounts_billing", $billing); } } if($cid && $uid) { $this->db->update("leads", array("cid" => $cid), array("id" => $uid)); } if($cid && is_array($acct_cstm) && count($acct_cstm)) { $query = $this->db->get_where("accounts_cstm", array("crm_id" => $cid), 1); $data = reset($query->result()); if(!empty($data->crm_id)) { $new = array(); foreach($acct_cstm as $k => $v) { if(!empty($cmp->$k)) { $new[$k] = $cmp->$k; unset($acct_cstm[$k]); } else { $new[$k] = $v; } } //$this->db->update("accounts_cstm", $acct_cstm, array("crm_id" => $cid)); $acct_cstm = $new; } else { $acct_cstm['crm_id'] = $cid; $acct_cstm['date_entered'] = time(); $acct_cstm['date_modified'] = time(); $acct_cstm['created_by'] = $this->session->userdata('id'); $acct_cstm['modified_user_id'] = $this->session->userdata('id'); if(empty($acct_cstm['rating'])) { $acct_cstm['rating'] = 1; } $this->db->insert("accounts_cstm", $acct_cstm); } } $true = TRUE; } fclose($fd); return $true; } 

Now, as far as I can see, the script is working fine. There is nothing wrong with the code itself. The problem is that after about 400-500 lines the script just stops. I am not getting an error, but no code is being processed.

I know this because after that I have code that should return a redirect page through AJAX. However, after my loop in the importLeads function ever loads,

I am not sure how to make this script more efficient ... I am sure it is time, but I do not know how to make it more efficient. I need this script to handle all the information above separately. I have many separate tables, all of which are related to each other, and this import script should set everything differently.

I spoke with my client about this project. This script works when I lower it to 400 lines. He has many such CSV files, which are about 75,000 lines. The one I import is smaller, only about 1,200 lines.

I tried looking for alternative methods, such as importing a MySQL script, but I cannot do this because this script must import data into separate tables, and it must check existing data first. I should also have all the empty fields with the imported information, but this will make it even worse.

If anyone knows a more efficient method, it will be very appreciated. I tried to be as detailed as I could. It should be noted that I use CodeIgniter, but if there is a more efficient way that CodeIgniter does not use, I will take it (I can still put it in the CI model).

+6
import php mysql csv codeigniter
source share
6 answers

I wrote PHP scripts to bulk upload data published using a Stack Overflow data dump. I import millions of lines and it doesn't take so long.

Here are some suggestions:

  • Do not rely on autocommit. The overhead of starting and completing a transaction for each row is huge. Use explicit transactions and commit after every 1000 rows (or more).

  • Use prepared statements. . Since you basically do the same inserts thousands of times, you can prepare each insert before the start of the loop, and then do it during the loop, passing the values ​​as parameters, I don’t know how to do this with the CodeIgniter database library, sort out.

  • Configure MySQL to import. Increase cache buffers, etc. For more information, see INSERT Message Rate .

  • Use LOAD DATA INFILE. If it is possible. This is literally 20 times faster than using INSERT to load data in a row by row. I understand if you cannot, because you need to get the last insert ID and so on. But in most cases, even if you are reading a CSV file, regroup it and write it to several CSV files with a temporary time, loading the data will still be faster than using INSERT.

  • Perform offline work. Do not run lengthy tasks during a web request. Limiting the time of a PHP request will stop working, if not today, and then on Tuesday, when the task will be 10% more. Instead, make the web request request a job, and then return control to the user. You must start the data import as a server process and periodically allow the user to look at the speed of progress. For example, a cheap way to do this is for your import script to output "." to a temporary file, and then the user can request to view the temporary file and continue rebooting in his browser. If you want a fantasy, do something with Ajax.

+6
source share

To efficiently import data into MySQL, you must use LOAD DATA INFILE . This will make a huge difference in performance.

If you need to pre-process your data, do it using the script above, then export it back to CSV / TSV and use LOAD DATA queries for final import into your database.

Your script will not exceed 500 lines because it will most likely reach the PHP runtime limit. You can use the set_time_limit () function to give your script an unlimited period, in which case you need to call set_time_limit(0) at the beginning of your script.

+4
source share

Another point I have to open is the code needed to execute:

currently where you do this:

 foreach($row as $num => $val) { if(empty($map[$num])) { continue; } $val = str_replace('"', "&#34;", $val); $val = str_replace("'", "&#39;", $val); switch($map[$num]) { // Company Account case "company_name": $acct['company_name'] = $val; break; 

Instead, you need to change the switch / case to do this:

1) create a data map of your displayed fields. The data map must have the correct array to which the field belongs, as well as the index of this array. For example:

 $dataMap['company_name'] = array($acct, 'company_name'); $dataMap['lead_type'] = array($lead_type, 'name'); . . . $dataMap['bols_per_mo'] = array($acct_cstm, 'approx_bols_per_mo'); . . . 

And so on

Then 2) Replace your massive switch statement with this simple piece of code:

 foreach($row as $num => $val) { if(empty($map[$num])) { continue; } $val = str_replace('"', "&#34;", $val); $val = str_replace("'", "&#39;", $val); $mappingRecord = $dataMap[ $map[$num] ]; //The first element is the array the data should go in $destinationArray = $mappingRecord[0]; //the second element is the index of the array it should go in $destinationArray[$mappingRecord[1]] = $val; 
+2
source share

load raw csv file data into staging tables using a data entry method that is nice and fast:

 set autocommit = 0; load data infile.. load data infile.. ... commit; 

After loading the data, perform cleaning, mapping and verification of the stored data, etc.

 call cleanse_staging_data(); call map_staging_data(); call validate_staging_data(); 

Once the data has been processed, copy the data from the staging tables to the correct data tables:

 call copy_staging_to_production(); 

or something like that.

+1
source share

Do you click on the PHP script time limit when uploading large files?

Try the following:

 set_time_limit(0); 

to disable the 30 second interval by default. The server limit can be obtained using max_execution_time() . If the server is not long enough for this bulk load job, you need to figure out how to get the local relay server to change it or load it in another way.

+1
source share

I sometimes had to do something like that. There are several problems that you potentially face:

  • PHP script timeout . After a specified period of time, PHP will automatically kill this process. You can disable this in the php.ini file or with set_time_limit(0) . However, some hosts have disabled this method, and some hosts have a separate clock system to kill processes that have been running for a certain period of time so that the server is not down.
  • Memory limit . PHP will allow you to set the maximum memory limit in the same php.ini. If he hits him, it will cause a fatal error and die. You can see this in the error log, but nothing will be displayed in the browser.
  • MySQL Query overhead . As others noted, there is a lot of overhead for every single request. I didn’t have enough lines to justify throwing load data infile into the storm. I also did not need to get the results for each individual query, so I just put them all in one call and released it in one mysql_query (which you can do from your controller in CodeIgniter like this: mysql_query($sql, $this->db->conn_id); from then on, it will throw up if you pass this DB :: query).

Bill Carwin pointed out many good optimization methods when dealing with very large data sets, but if you have problems with ~ 400 rows, I don’t think it will be very useful for you. Check your error logs, fix the problem and then optimize it.

0
source share

All Articles