Is it always safe to combine select (2) and buffered IO for files?

I use IO :: Select to track a variable number of file descriptors to read. The documentation I came across strongly suggests not combining the select statement with <> (readline) to read from files.

My situation:

I will use only each file descriptor once, i.e. when select offers me a file descriptor, it will be fully used and then removed from the selection. I get a hash and a variable number of files. I do not mind if it blocks for a while.

For more context, I am a client sending information that will be processed by my servers. Each file descriptor is a different server that I'm talking to. As soon as the server is completed, the hash result will be sent back to me from everyone. Inside this hash there is a number indicating the number of files that need to be executed.

I want to use readline to integrate with existing project code to migrate Perl objects and files.

Code example:

my $read_set = IO::Select()->new; my $count = @agents_to_run; #array comes as an argument for $agent ( @agents_to_run ) { ( $sock, my $peerhost, my $peerport ) = server($config_settings{ $agent }-> { 'Host' },$config_settings{ $agent }->{ 'Port' }; $read_set->add( $sock ); } while ( $count > 0) { my @rh_set = IO::Select->can_read(); for my $rh ( @{ $rh_set } ) { my %results = <$rh>; my $num_files = $results{'numFiles'}; my @files = (); for (my i; i < $num_files; i++) { $files[i]=<$rh>; } #process results, close fh, decrement count, etc } } 
+5
source share
2 answers

Using readline (aka <> ) is completely wrong for two reasons: it is buffered and blocked.


Buffering is bad

More specifically, buffering using buffers that cannot be checked is bad.

The system can do all the necessary buffering as you can view its buffers using select .

The Perl IO system should not be allowed to do any buffering, because you cannot peer into its buffers.

Let's look at an example of what might happen using readline in a select loop.

  • "abc\ndef\n" arrives at the descriptor.
  • select notifies you that there is data to read.
  • readline will try to read the chunk from the handle.
  • "abc\ndef\n" will be placed in the Perl buffer for the descriptor.
  • readline will return "abc\n" .

At this point, you call select again, and you want it to know that there is still something to read ( "def\n" ). However, select will indicate that there is nothing to read, since select is a system call, and the data has already been read from the system. That means you have to wait until more comes before you can read "def\n" .

The following program illustrates this:

 use IO::Select qw( ); use IO::Handle qw( ); sub producer { my ($fh) = @_; for (;;) { print($fh time(), "\n") or die; print($fh time(), "\n") or die; sleep(3); } } sub consumer { my ($fh) = @_; my $sel = IO::Select->new($fh); while ($sel->can_read()) { my $got = <$fh>; last if !defined($got); chomp $got; print("It took ", (time()-$got), " seconds to get the msg\n"); } } pipe(my $rfh, my $wfh) or die; $wfh->autoflush(1); fork() ? producer($wfh) : consumer($rfh); 

Output:

 It took 0 seconds to get the msg It took 3 seconds to get the msg It took 0 seconds to get the msg It took 3 seconds to get the msg It took 0 seconds to get the msg ... 

This can be fixed using unbuffered I / O:

 sub consumer { my ($fh) = @_; my $sel = IO::Select->new($fh); my $buf = ''; while ($sel->can_read()) { sysread($fh, $buf, 64*1024, length($buf)) or last; while ( my ($got) = $buf =~ s/^(.*)\n// ) { print("It took ", (time()-$got), " seconds to get the msg\n"); } } } 

Output:

 It took 0 seconds to get the msg It took 0 seconds to get the msg It took 0 seconds to get the msg It took 0 seconds to get the msg It took 0 seconds to get the msg It took 0 seconds to get the msg ... 

Lock is bad

Let's look at an example of what might happen using readline in a select loop.

  • "abc\ndef\n" arrives at the descriptor.
  • select notifies you that there is data to read.
  • readline will try to read the chunk from the socket.
  • "abc\ndef\n" will be placed in the Perl buffer for the descriptor.
  • readline did not receive a newline, so it tries to read another fragment from the socket.
  • There are no more data available, so they are blocked.

This does not meet the purpose of using select .

[Preparing a demo code]


Decision

You must implement a version of readline that does not block, but uses only buffers that you can check. The second part is simple because you can check the buffers that you create.

  • Create a buffer for each descriptor.
  • When data comes from the descriptor, read them, but no more. When the data is waiting (as we know from select ), sysread will return what is available, without waiting, sysread will appear again. This makes sysread ideal for this task.
  • Add the read data to the appropriate buffer.
  • For each complete message in the buffer, extract it and process it.

Adding a pen:

 $select->add($fh); $clients{fileno($fh)} = { buf => '', ... }; 

select loop:

 while (my @ready = $select->can_read) { for my $fh (@ready) { my $client = $clients{fileno($fh)}; our $buf; local *buf = \($client->{buf}); # alias $buf = $client->{buf}; my $rv = sysread($fh, $buf, 64*1024, length($buf)); if (!$rv) { if (!defined($rv)) { ... # Handle error } elsif (length($buf)) { ... # Handle eof with partial message } else { ... # Handle eof } delete $clients{fileno($fh)}; $sel->remove($fh); next; } while ( my ($msg) = $buf =~ s/^(.*)\n// ) ... # Process message. } } } 

By the way, this is much easier to do with threads, and it doesn't even work with authors!

+10
source

After much discussion with @ikegami, we determined that in my extremely specific case, readline is actually not a problem. I still leave ikegami as the accepted correct answer, because it is far and far the best way to deal with the general situation, and a wonderful record.

Readline (aka <>) is valid in my situation due to the following facts:

  • The handle is returned only once from the select statement, and then it is closed / deleted
  • I send only one message through a file descriptor
  • I don't care if read processes the block
  • I take into account timeouts and private return descriptors from select (error checking is not included in the sample code above)
+1
source

All Articles