Import only non-existent data into the database from CSV

I created a script that reads data from a CSV file, checks if data exists in the database, and imports it if it is not. If the data exists (product specific code), then the rest of the information should be updated from the CSV file.

For example; I have a member with the code WTW-2LT, named Alex and last name Johnson in my CSV file. The script checks if there is a member with the WTW-2LT code named Alex and the last name Johnson, if so, the contact details and additional data must be updated using the script (other details, such as the subject and lecturer, also need to be checked, all the data are on the same line in CSV), if it does not exist, a new item must be created.

My script is what I still have with minimal other checks to prevent distraction for now;

while ($row = fgetcsv($fp, null, ";")) {
    if ($header === null) {
        $header = $row;
        continue;
    }

    $record = array_combine($header, $row);

    $member = $this->em->getRepository(Member::class)->findOneBy([
        'code' =>$record['member_code'],
        'name' =>$record['name'],
        'surname' =>$record['surname'],
    ]);

    if (!$member) {
        $member = new Member();
        $member->setCode($record['member_code']);
        $member->setName($record['name']);
        $member->setName($record['surname']);
    }    
    $member->setContactNumber($record['phone']);
    $member->setAddress($record['address']);
    $member->setEmail($record['email']);

    $subject = $this->em->getRepository(Subject::class)->findOneBy([
        'subject_code' => $record['subj_code']
    ]);

    if (!$subject) {
        $subject = new Subject();
        $subject->setCode($record['subj_code']);
    }
    $subject->setTitle($record['subj_title']);
    $subject->setDescription($record['subj_desc']);
    $subject->setLocation($record['subj_loc']);

    $lecturer = $this->em->getRepository(Lecturer::class)->findOneBy([
        'subject' => $subject,
        'name' => $record['lec_name'],
        'code' => $record['lec_code'],
    ]);

    if (!$lecturer) {
        $lecturer = new Lecturer();
        $lecturer->setSubject($subject);
        $lecturer->setName($record['lec_name']);
        $lecturer->setCode($record['lec_code']);
    }
    $lecturer->setEmail($record['lec_email']);
    $lecturer->setContactNumber($record['lec_phone']);

    $member->setLecturer($lecturer);

    $validationErrors = $this->validator->validate($member);
    if (!count($validationErrors)) {
        $this->em->persist($member);
        $this->em->flush();
    } else {
        // ...
    }
}

You may notice that this script should query the database 3 times to check if one CSV line exists. In my case, I have files up to 2000+ in length, so for each line, to execute 3 queries, to check whether this line exists or not, takes quite a long time.

Unfortunately, I also can’t import rows in a batch, because if one object does not exist, it will create it so many times until the package is dumped into the database, and then I will sit with duplicate entries that do not make sense .

? ( ?), ...

-, , ( , ?)

+6
3

, 2000+ 3- . , , :

. , PHP, . symfony, , . MySQL INSERT ... ON DUPLICATE KEY update. 3 (, , ) ( ), : , , . MySQL checsk , , : , .

, SQL Symfony, , , .

+6

, , , SQL-. , DBS .

CSV MySQL SQL

LOAD DATA INFILE 'data.csv'
INTO TABLE tmp_import

, CSV, :

data.csv - , , , .

, , csv ( )

WTW-2LT, Alex, Johnson, subj_code1, ..., lec_name1, ...
WTW-2LT, Alex, Johnson, subj_code1, ..., lec_name2, ...
WTW-2LT, Alex, Johnson, subj_code2, ..., lec_name3, ...
WTW-2LU, John, Doe,     subj_code3, ..., lec_name4, ...

:

SELECT member_code, name, surname
FROM tmp_import
GROUP BY member_code, name, surname

member_code - , GROUP BY member_code MySQL. DBS , .

, :

SELECT subj_code, subj_title, member_code
FROM tmp_import
GROUP BY subj_code

SELECT lec_code, lec_name, subj_code
FROM tmp_import
GROUP BY lec_code

, subj_code lec_code .

, MySQL CREATE TABLE ... SELECT -syntax,

CREATE TABLE tmp_import_members
SELECT member_code, name, surname
FROM tmp_import
GROUP BY member_code, name, surname

:

INSERT INTO members (member_code, name, surname)
SELECT member_code, name, surname
FROM tmp_import_members
WHERE tmp_import_members.member_code NOT IN (
  SELECT member_code FROM members WHERE member_code IS NOT NULL
);

UPDATE members 
JOIN tmp_import_members ON 
  members.member_code = tmp_import_members.members_code
SET 
  members.name = tmp_import_members.name,
  members.surname = tmp_import_members.surname;

.

  • CSV , ,
  • 3 , ,
  • 3 3 ( )
  • ,

: CSV , 3 3 .

, , MySQL NOT IN JOIN .

+2

sql ,

SELECT
  (SELECT COUNT(*) FROM member WHERE someCondition) as memberCount, 
  (SELECT COUNT(*) FROM subject WHERE someCondition) as subjectCount,
  (SELECT COUNT(*) FROM lecturer WHERE someCondition) as lecturerCount

, . , SQL

, , SQL Doctrine

Symfony2 Doctrine: SQL-

+1
source

All Articles