Import 60,000 nodes

I use the Table Wizard + Migrate module to import nodes into my Drupal installation.

I need to import about 60,000 questions / answers (they are both nodes) and I thought that would be an easy task.

However, the migration process imports 4 nodes per minute, and it will take about 11 days to complete the import.

I was wondering if I can do this faster by importing directly into mysql. But I really need to create 60,000 nodes. I think Drupal is going to store additional information in other tables ... and it's not so safe.

What are you suggesting me to do? Wait 10 days? Thanks

+4
source share
6 answers

Table migration should be an order of magnitude faster than this.

Do you use pathauto?

If so, try disabling the pathauto module, which often causes big performance problems when importing.

Secondly, if disabling pathauto does not work, disable all the non-essential modules that you may be running - some modules do crazy things. Eliminate other modules as sources of the problem.

Third, is MySQL db logging enabled? This can have a big impact on performance - not the level you are talking about, but its what you need to consider.

Third, install xdebug and run your mysql log to see what exactly is happening.

What is your PHP memory limit?

Do you have a lot of free disk space?

+7
source

If you do not, you should use drush to migrate nodes in packages. You can even write a shell script for it if you want it to be automated. Using the command line should reduce the time required to import nodes. With a script, you can make this an automated task that you have nothing to worry about.

One thing I want to point out is 4 knots per minute is very low. I once needed to import some nodes from a CSV file using migration, etc. I needed to import 300 nodes with a location, 4-5 CCK fields, and I did it in seconds. Therefore, if you import only 4 nodes per minute, you either have extremely complex nodes or something suspicious.

What are the specifications of the computer you use for this? Where is the import source located?

+1
source

This is a complex topic, but Drupal is actually very well covered. I do not know what's going on. But you don’t know where to look.

+1
source

4 node per minute is incredibly slow. Migration should not last so long. You can speed things up a bit with Drush, but probably not enough to get a reasonable import time (hours, not days). In fact, this will not affect your main problem: your import takes too much time. The overhead of the Migrate GUI is not that big.

Importing directly to MySQL will certainly be faster, but there is a reason Migrate exists. node Drupal's database repository is complex, so it's best to let Drupal work, rather than trying to figure out what is going on.

Do you use migrate hooks for extra processing on each node? I would suggest adding some entries to find out what took so long. Check it at 10 knots at a time until you find out the lag before doing everything 60k.

0
source

We had a similar problem with installing Drupal 7. It remains to run the entire output year on import, and it imported only 1000 lines of the file.

The funny thing is that exactly the same import to a pre-production machine took 90 minutes.

We have finished comparing the source code (make sure that we are in the same commit in git), the database schema (identical), the number of nodes on each machine (not identical, but similar) ...

The long story is made short, the only significant difference between the two machines was the max_execution_time option in the php.ini settings php.ini .

The production machine had max_execution_time = 30 , and the preprocessing machine had max_execution_time = 3000 . It seems that the migration module has a kind of system for processing the "short" max_execution_time , which is less than optimal.

Conclusion : set max_execution_time = 3000 or more in php.ini , which helps a lot in the migration module.

0
source

I just wanted to add a note saying that the pathauto disable function really helps. I had an import of more than 22 thousand lines, and before it was disconnected, it took more than 12 hours and sneaked several times during import. After disabling pathauto and then starting the import, it took only 62 minutes and did not crash once.

Just a head, I created a module that disconnects the pathauto module before starting the import, and then after the feed finishes, it restores the pathauto module. Here is the code from the module in case someone should have this ability:

 function YOURMODULENAME_feeds_before_import(FeedsSource $source) { $modules = array('pathauto'); drupal_set_message(t('The ').$modules[0].t(' has been deployed and should begin to disable'), 'warning'); module_disable($modules); drupal_set_message(t('The ').$modules[0].t(' module should have been disabled'), 'warning'); } function YOURMODULENAME_feeds_after_import(FeedsSource $source) { $modules = array('pathauto'); module_enable($modules); drupal_set_message($modules[0].t(' should be reenabled now'), 'warning'); } 
0
source

All Articles