I wrote a perl program that parses csv entries in db.
The program worked fine, but took a long time. Therefore, I decided to develop the basic process of parsing.
After the battle with the plug, it now works well and works about 4 times faster. The main analysis method is highly database dependent. For interests, for each analyzed record there are the following db calls:
1 - it is checked that the uniquely generated base62 is unique with respect to the base map table 2 - There is an archive check to see if the record has changed 3 - The record is inserted into db
The problem is that I started getting the errors โMysql is goneโ while the parser started in branched mode, so after many attempts I came up with the following mysql configuration:
#
It seems that the problems during the operation of the parser have been fixed. However, now I get the "Mysql server is gone" when the next module starts after the main analyzer.
The weird thinf is a problem module that includes a very simple SELECT query on a table with currently only 3 records. Run directly as a test (not after the parser), it works great.
I tried to add a pause 4 minutes after starting the analyzer module, but I get the same error.
I have a basic DBConnection.pm model with this: DBConnection package;
use DBI; use PXConfig; sub new { my $class = shift;
Then all modules, including forked parser modules, open a database connection using:
package Example; use DBConnection; sub new { my $class = shift; my $db = new DBConnection; my $connect2 = $db->connect(); my $self = { connect2 => $connect2, }; bless $self, $class; return $self; }
The question is whether there is a module Module1.pm that calls Module2.pm, which calls Module3.pm, and each of them creates a database connection, as shown above (i.e. in the constructor), then they use different connections to database or connection?
What is interesting to me is if the script works 6 hours to finish, if the top-level call to the db connection synchronizes the lower-level db connection, even if the lower-level module creates its own connection.
It is very unpleasant to try to find the problem, because I can only reproduce the error after performing a very lengthy parsing process.
Sorry for the long question, thanks in advance to everyone who can give me any ideas.
UPDATE 1:
Here is the actual branching part:
my $fh = Tie::Handle::CSV->new( "$file", header => 1 ); while ( my $part = <$fh> ) { if ( $children == $max_threads ) { $pid = wait(); $children--; } if ( defined( $pid = fork ) ) { if ($pid) { $children++; } else { $cfptu = new ThreadedUnit(); $cfptu->parseThreadedUnit($part, $group_id, $feed_id); } } }
And then ThreadedUnit:
package ThreadedUnit; use CollisionChecker; use ArchiveController; use Filters; use Try::Tiny; use MysqlLogger; sub new { my $class = shift; my $db = new DBConnection; my $connect2 = $db->connect(); my $self = { connect2 => $connect2, }; bless $self, $class; return $self; } sub parseThreadedUnit { my ( $self, $part, $group_id, $feed_id ) = @_; my $connect2 = $self->{connect2};
So, I understand that the connection to the database is called after forking.
But, as I mentioned above, the forked code described above works just fine. This is the next module that does not work, which is launched from the controller module, which simply passes through each working module one at a time (one of them is a parser) - the controller module does not create a database connection in its design or anywhere else.
Update 2
I forgot to mention that I don't get any errors in the problem module after the parser, if I parse only a small number of files and not a full queue.
So, itโs almost as if intense branched parsing and access to the database makes it inaccessible to normal non-forked processes immediately after its completion for some indefinite time.
The only thing I noticed when completing the operation of the parser in the Mysql status is Threads_connected, about 500, and has not decreased for some time.