Python performance vs perl

Question

Python performance vs perl

Decision

This solved all the problems with my Perl code (plus additional implementation code .... :-)) Enabling both Perl and Python is equally awesome.

use WWW::Curl::Easy;

Thanks to everyone who answered, I really appreciate.

Edit

It seems like the Perl code I use spends most of the time doing http get, for example:

 my $start_time = gettimeofday; $request = HTTP::Request->new('GET', 'http://localhost:8080/data.json'); $response = $ua->request($request); $page = $response->content; my $end_time = gettimeofday; print "Time taken @{[ $end_time - $start_time ]} seconds.\n";

Result:

 Time taken 74.2324419021606 seconds.

My python code is in comparison:

 start = time.time() r = requests.get('http://localhost:8080/data.json', timeout=120, stream=False) maxsize = 100000000 content = '' for chunk in r.iter_content(2048): content += chunk if len(content) > maxsize: r.close() raise ValueError('Response too large') end = time.time() timetaken = end-start print timetaken

Result:

 20.3471381664

In both cases, the sorting time is second. Therefore, first of all, I apologize for the misleading question, and this is another lesson for me to never make assumptions .... :-)

I am not sure what is the best thing about this right now. Perhaps someone can suggest a better way to fulfill the request in perl?

End of editing

This is just a quick question about sorting performance differences in Perl vs Python. It is not a question of which language is better / faster, etc., For the record, I first wrote it in perl, noticed the sorting time, and then tried to write the same thing in python to find out how fast it would be, I just want to know how can i make Perl code run as fast as python code?

Lets say we have the following json:

 ["3434343424335": { "key1": 2322, "key2": 88232, "key3": 83844, "key4": 444454, "key5": 34343543, "key6": 2323232 }, "78237236343434": { "key1": 23676722, "key2": 856568232, "key3": 838723244, "key4": 4434544454, "key5": 3432323543, "key6": 2323232 } ]

Suppose we have a list of about 30k-40k entries that we want to sort by one of the sub-keys. Then we want to create a new array of records arranged under the key.

Perl - takes about 27 seconds

 my @list; $decoded = decode_json($page); foreach my $id (sort {$decoded->{$b}->{key5} <=> $decoded->{$a}->{key5}} keys %{$decoded}) { push(@list,{"key"=>$id,"key1"=>$decoded->{$id}{key1}...etc)); }

Python - takes about 6 seconds

 list = [] data = json.loads(content) data2 = sorted(data, key = lambda x: data[x]['key5'], reverse=True) for key in data2: tmp= {'id':key,'key1':data[key]['key1'],etc.....} list.append(tmp)

For Perl code, I tried using the following settings:

 use sort '_quicksort'; # use a quicksort algorithm use sort '_mergesort'; # use a mergesort algorithm

+5

performance python sorting perl

John Jul 31 '15 at 18:20

source share

3 answers

Something else is playing here; I can start your variety in half a second. Improving this will not depend on the sorting algorithm as well as reducing the number of code runs per comparison; The Schwartz transformation goes on a third of a second, the Guttman-Rosler transformation reduces it to a quarter of a second:

 #!/usr/bin/perl use 5.014; use warnings; my $decoded = { map( (int rand 1e9, { map( ("key$_", int rand 1e9), 1..6 ) } ), 1..40000 ) }; use Benchmark 'timethese'; timethese( -5, { 'original' => sub { my @list; foreach my $id (sort {$decoded->{$b}->{key5} <=> $decoded->{$a}->{key5}} keys %{$decoded}) { push(@list,{"key"=>$id,%{$decoded->{$id}}}); } }, 'st' => sub { my @list; foreach my $id ( map $_->[1], sort { $b->[0] <=> $a->[0] } map [ $decoded->{$_}{key5}, $_ ], keys %{$decoded} ) { push(@list,{"key"=>$id,%{$decoded->{$id}}}); } }, 'grt' => sub { my $maxkeylen=15; my @list; foreach my $id ( map substr($_,$maxkeylen), sort { $b cmp $a } map sprintf('%0*s', $maxkeylen, $decoded->{$_}{key5}) . $_, keys %{$decoded} ) { push(@list,{"key"=>$id,%{$decoded->{$id}}}); } }, });

+6

ysth Jul 31 '15 at 18:55

source share

Do not create a new hash for each entry. Just add the key to the existing one.

 $decoded->{$_}{key} = $_ for keys(%$decoded); my @list = sort { $b->{key5} <=> $a->{key5} } values(%$decoded);

Using Sort :: Key will make it even faster.

 use Sort::Key qw( rukeysort ); $decoded->{$_}{key} = $_ for keys(%$decoded); my @list = rukeysort { $_->{key5} } values(%$decoded);

+4

ikegami Jul 31 '15 at 19:08

source share

Schwern · Accepted Answer · 2015-07-31T18:52:48+0000

Your test is incorrect, you are comparing several variables, not just one. This is not just sorting data, but also handling JSON, creating strings and adding to the array. You cannot know how much time is spent sorting and how much is spent doing the rest.

Complicating matters is that there are several different JSON implementations in Perl, each of which has its own performance characteristics. Modify the base JSON library and the reference will change again.

If you want to compare test results, you will have to modify your test code to exclude the cost of downloading test data from the JSON test or not.

Perl and Python have their own libraries for comparison, which can evaluate individual functions, but their toolkit can make them work much less than in the real world. The performance shift for each benchmarking implementation will be different and may lead to false evasion. These benchmarking libraries are more useful for comparing two functions in the same program. For comparison between languages, keep it simple.

The simplest thing to do to get an accurate benchmark is to program them using a wall clock.

 # The current time to the microsecond. use Time::HiRes qw(gettimeofday); my @list; my $decoded = decode_json($page); my $start_time = gettimeofday; foreach my $id (sort {$decoded->{$b}->{key5} <=> $decoded->{$a}->{key5}} keys %{$decoded}) { push(@list,{"key"=>$id,"key1"=>$decoded->{$id}{key1}...etc)); } my $end_time = gettimeofday; print "sort and append took @{[ $end_time - $start_time ]} seconds\n";

(I leave the Python version as an exercise)

From here you can improve your technique. You can use processor seconds instead of a wall clock. Adding an array and the cost of creating a string are still involved in benchmarking, they can be eliminated, so you just compare the sorting. Etc.

In addition, you can use the profiler to find out where your programs spend their time. They have the same consistency of performance as libraries for comparison, the results are useful only to find out what percentage of the time the program uses from it, but it will be useful to quickly see if your test has unexpected drag and drop.

It is important to check what you think you are comparing.

Python performance vs perl

More articles: