The PDL command you want is indadd . (Thanks to Chris Marshall, PDL Pumpking, for pointing this out elsewhere .)
PDL is for what I call "vectorized" operations. Compared to C operations, Perl operations are quite slow, so you want the number of calls to the PDL method to be minimal and each call does a lot of work. For example, this test allows you to specify the number of updates for one session (as a command line parameter). The perl side should loop, but the PDL side only performs five or so functions:
use PDL; use Benchmark qw/cmpthese/; my $updates_per_round = shift || 1; my $N = 1_000_000; my @perl = (0 .. $N - 1); my $pdl = zeroes $N; cmpthese(-1,{ perl => sub{ $perl[int(rand($N))]++ for (1..$updates_per_round); }, pdl => sub{ my $to_update = long(random($updates_per_round) * $N); indadd(1,$to_update,$pdl); } });
When I run this with argument 1, I get even worse performance than using set , which I expected:
$ perl script.pl 1 Rate pdl perl pdl 21354/s -- -98% perl 1061925/s 4873% --
This is a lot of makeup space! But hold on there. If we perform 100 iterations per round, we get an improvement:
$ perl script.pl 100 Rate pdl perl pdl 16906/s -- -18% perl 20577/s 22% --
And with 10,000 updates per round, PDL is four times better than Perl:
$ perl script.pl 10000 Rate perl pdl perl 221/s -- -75% pdl 881/s 298% --
PDL continues to run about 4 times faster than regular Perl for even larger values.
Note that PDL performance may be degraded for more complex operations. This is because the PDL will allocate and break large, but temporary workspaces for intermediate operations. In this case, you may need to use Inline::Pdlpp . However, this is not a tool for beginners, so do not jump there until you have determined that this is really the best for you.
Another alternative to all of this is to use Inline::C as follows:
use PDL; use Benchmark qw/cmpthese/; my $updates_per_round = shift || 1; my $N = 1_000_000; my @perl = (0 .. $N - 1); my $pdl = zeroes $N; my $inline = pack "d*", @perl; my $max_PDL_per_round = 5_000; use Inline 'C'; cmpthese(-1,{ perl => sub{ $perl[int(rand($N))]++ for (1..$updates_per_round); }, pdl => sub{ my $to_update = long(random($updates_per_round) * $N); indadd(1,$to_update,$pdl); }, inline => sub{ do_inline($inline, $updates_per_round, $N); }, }); __END__ __C__ void do_inline(char * packed_data, int N_updates, int N_data) { double * actual_data = (double *) packed_data; int i; for (i = 0; i < N_updates; i++) { int index = rand() % N_data; actual_data[index]++; } }
For me, the Inline function is consistently superior to both Perl and PDL. For large $updates_per_round values, say 1000, I get the Inline::C version about 5 times faster than pure Perl and between 1.2x and 2x faster than PDL. Even when $updates_per_round is just 1, where Perl removes the PDL, Inline code is 2.5 times faster than Perl code.
If this is all you need to accomplish, I recommend using Inline::C
But if you need to do a lot of manipulation of your data, it is best to stick with the PDL for its power, flexibility, and performance. See below how you can use vec() with PDL data.