Perl Module Instantiation + DBI + Forks "Mysql Server Gone"

Question

Perl Module Instantiation + DBI + Forks "Mysql Server Gone"

I wrote a perl program that parses csv entries in db.

The program worked fine, but took a long time. Therefore, I decided to develop the basic process of parsing.

After the battle with the plug, it now works well and works about 4 times faster. The main analysis method is highly database dependent. For interests, for each analyzed record there are the following db calls:

1 - it is checked that the uniquely generated base62 is unique with respect to the base map table 2 - There is an archive check to see if the record has changed 3 - The record is inserted into db

The problem is that I started getting the errors “Mysql is gone” while the parser started in branched mode, so after many attempts I came up with the following mysql configuration:

# # * Fine Tuning # key_buffer = 10000M max_allowed_packet = 10000M thread_stack = 192K thread_cache_size = 8 myisam-recover = BACKUP max_connections = 10000 table_cache = 64 thread_concurrency = 32 wait_timeout = 15 tmp_table_size = 1024M query_cache_limit = 2M #query_cache_size = 100M query_cache_size = 0 query_cache_type = 0

It seems that the problems during the operation of the parser have been fixed. However, now I get the "Mysql server is gone" when the next module starts after the main analyzer.

The weird thinf is a problem module that includes a very simple SELECT query on a table with currently only 3 records. Run directly as a test (not after the parser), it works great.

I tried to add a pause 4 minutes after starting the analyzer module, but I get the same error.

I have a basic DBConnection.pm model with this: DBConnection package;

 use DBI; use PXConfig; sub new { my $class = shift; ## MYSQL Connection my $config = new PXConfig(); my $host = $config->val('database', 'host'); my $database = $config->val('database', 'db'); my $user = $config->val('database', 'user'); my $pw = $config->val('database', 'password'); my $dsn = "DBI:mysql:database=$database;host=$host;"; my $connect2 = DBI->connect( $dsn, $user, $pw, ); $connect2->{mysql_auto_reconnect} = 1; $connect2->{RaiseError} = 1; $connect2->{PrintError} = 1; $connect2->{ShowErrorStatement} = 1; $connect2->{InactiveDestroy} = 1; my $self = { connect => $connect2, }; bless $self, $class; return $self; }

Then all modules, including forked parser modules, open a database connection using:

 package Example; use DBConnection; sub new { my $class = shift; my $db = new DBConnection; my $connect2 = $db->connect(); my $self = { connect2 => $connect2, }; bless $self, $class; return $self; }

The question is whether there is a module Module1.pm that calls Module2.pm, which calls Module3.pm, and each of them creates a database connection, as shown above (i.e. in the constructor), then they use different connections to database or connection?

What is interesting to me is if the script works 6 hours to finish, if the top-level call to the db connection synchronizes the lower-level db connection, even if the lower-level module creates its own connection.

It is very unpleasant to try to find the problem, because I can only reproduce the error after performing a very lengthy parsing process.

Sorry for the long question, thanks in advance to everyone who can give me any ideas.

UPDATE 1:

Here is the actual branching part:

 my $fh = Tie::Handle::CSV->new( "$file", header => 1 ); while ( my $part = <$fh> ) { if ( $children == $max_threads ) { $pid = wait(); $children--; } if ( defined( $pid = fork ) ) { if ($pid) { $children++; } else { $cfptu = new ThreadedUnit(); $cfptu->parseThreadedUnit($part, $group_id, $feed_id); } } }

And then ThreadedUnit:

 package ThreadedUnit; use CollisionChecker; use ArchiveController; use Filters; use Try::Tiny; use MysqlLogger; sub new { my $class = shift; my $db = new DBConnection; my $connect2 = $db->connect(); my $self = { connect2 => $connect2, }; bless $self, $class; return $self; } sub parseThreadedUnit { my ( $self, $part, $group_id, $feed_id ) = @_; my $connect2 = $self->{connect2}; ## Parsing stuff ## DB Update in try -> catch exit(); }

So, I understand that the connection to the database is called after forking.

But, as I mentioned above, the forked code described above works just fine. This is the next module that does not work, which is launched from the controller module, which simply passes through each working module one at a time (one of them is a parser) - the controller module does not create a database connection in its design or anywhere else.

Update 2

I forgot to mention that I don't get any errors in the problem module after the parser, if I parse only a small number of files and not a full queue.

So, it’s almost as if intense branched parsing and access to the database makes it inaccessible to normal non-forked processes immediately after its completion for some indefinite time.

The only thing I noticed when completing the operation of the parser in the Mysql status is Threads_connected, about 500, and has not decreased for some time.

+4

mysql perl dbi fork

someuser Apr 12 '13 at 8:07

source share

2 answers

user1919238 · Answer 1 · 2013-04-12T08:27:15+0000

It depends on how your program is structured, which is not clear from the question.

If you create a database connection before fork , Perl will create a copy of the DB connection object for each process. This can cause problems if two processes try to access the database simultaneously with the same database connection.

On the other hand, if you create DB connections after fork ing, each module will have its own connection. This should work, but you might have a timeout problem if module x creates a connection and then waits for the process to complete in module y and then tries to use the connection.

In short, here is what you want:

You have no open connections at the fork point. Child processes must create their own connections.
Open the connection right before you want to use it. If there is a point in your program when you need to wait, open the connection after completion of the wait.

bohica · Answer 2 · 2013-04-12T08:39:36+0000

Read dan1111 answer, but I suspect you are connecting and then deploying. When the child terminates, the DBI connection connector goes out of scope and closes. Because, as he tells you, you are better connected to the child for all the reasons that he said. Read about InactiveDestroy and AutoInactiveDestroy in DBI to help you understand what is going on.

Perl Module Instantiation + DBI + Forks "Mysql Server Gone"

UPDATE 1:

Update 2

More articles: