A simple hash performance test, it seems the C ++ version is slower than the perl version and the golang version.
- The perl version took about 200 milliseconds,
- The C ++ version took 280 milliseconds.
- The golang version took 56 milliseconds.
On my PC with a Core (TM) processor i7-2670QM @ 2.20GHz, Ubuntu 14.04.3LTS,
Any ideas?
perl version
use Time::HiRes qw( usleep ualarm gettimeofday tv_interval nanosleep clock_gettime clock_getres clock_nanosleep clock stat ); sub getTS { my ($seconds, $microseconds) = gettimeofday; return $seconds + (0.0+ $microseconds)/1000000.0; } my %mymap; $mymap{"US"} = "Washington"; $mymap{"UK"} = "London"; $mymap{"France"} = "Paris"; $mymap{"Russia"} = "Moscow"; $mymap{"China"} = "Beijing"; $mymap{"Germany"} = "Berlin"; $mymap{"Japan"} = "Tokyo"; $mymap{"China"} = "Beijing"; $mymap{"Italy"} = "Rome"; $mymap{"Spain"} = "Madrad"; $x = ""; $start = getTS(); for ($i=0; $i<1000000; $i++) { $x = $mymap{"China"}; } printf "took %f sec\n", getTS() - $start;
C ++ Version
#include <iostream> #include <string> #include <unordered_map> #include <sys/time.h> double getTS() { struct timeval tv; gettimeofday(&tv, NULL); return tv.tv_sec + tv.tv_usec/1000000.0; } using namespace std; int main () { std::unordered_map<std::string,std::string> mymap; // populating container: mymap["US"] = "Washington"; mymap["UK"] = "London"; mymap["France"] = "Paris"; mymap["Russia"] = "Moscow"; mymap["China"] = "Beijing"; mymap["Germany"] = "Berlin"; mymap["Japan"] = "Tokyo"; mymap["China"] = "Beijing"; mymap["Italy"] = "Rome"; mymap["Spain"] = "Madrad"; double start = getTS(); string x; for (int i=0; i<1000000; i++) { mymap["China"]; } printf("took %f sec\n", getTS() - start); return 0; }
Golang version
package main import "fmt" import "time" func main() { var x string mymap := make(map[string]string) mymap["US"] = "Washington"; mymap["UK"] = "London"; mymap["France"] = "Paris"; mymap["Russia"] = "Moscow"; mymap["China"] = "Beijing"; mymap["Germany"] = "Berlin"; mymap["Japan"] = "Tokyo"; mymap["China"] = "Beijing"; mymap["Italy"] = "Rome"; mymap["Spain"] = "Madrad"; t0 := time.Now() sum := 1 for sum < 1000000 { x = mymap["China"] sum += 1 } t1 := time.Now() fmt.Printf("The call took %v to run.\n", t1.Sub(t0)) fmt.Println(x) }
Update 1
To improve the C ++ version, change x = mymap["China"]; on mymap["China"]; but very little in performance.
Update 2
I got the original result when compiling without any optimization: g++ -std=c++11 unorderedMap.cc . When optimizing "-O2", it costs only about half the time (150 ms)
Update 3
To remove a possible call to the char* constructor in string , I created a string constant. The time is reduced to approximately 220 ms (there is no optimization during compilation). Thanks to the @ neil-kirk suggestion, with optimization (-O2 flag), the time is about 80 ms.
double start = getTS(); string x = "China"; for (int i=0; i<1000000; i++) { mymap[x]; }
Update 4
Thanks to @ steffen-ullrich, who pointed out a syntax error for the perl version. I changed that. The performance indicator is about 150 ms.
Update 5
It seems that the number of instructions executed matters. Using the command valgrind --tool=cachegrind <cmd>
For the Go version
$ valgrind --tool=cachegrind ./te1 ==2103== Cachegrind, a cache and branch-prediction profiler ==2103== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al. ==2103== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info ==2103== Command: ./te1 ==2103== --2103-- warning: L3 cache found, using its data for the LL simulation. The call took 1.647099s to run. Beijing ==2103== ==2103== I refs: 255,763,381 ==2103== I1 misses: 3,709 ==2103== LLi misses: 2,743 ==2103== I1 miss rate: 0.00% ==2103== LLi miss rate: 0.00% ==2103== ==2103== D refs: 109,437,132 (77,838,331 rd + 31,598,801 wr) ==2103== D1 misses: 352,474 ( 254,714 rd + 97,760 wr) ==2103== LLd misses: 149,260 ( 96,250 rd + 53,010 wr) ==2103== D1 miss rate: 0.3% ( 0.3% + 0.3% ) ==2103== LLd miss rate: 0.1% ( 0.1% + 0.1% ) ==2103== ==2103== LL refs: 356,183 ( 258,423 rd + 97,760 wr) ==2103== LL misses: 152,003 ( 98,993 rd + 53,010 wr) ==2103== LL miss rate: 0.0% ( 0.0% + 0.1% )
For an optimized version of C ++ (without an optimization flag)
$ valgrind --tool=cachegrind ./a.out ==2180== Cachegrind, a cache and branch-prediction profiler ==2180== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al. ==2180== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info ==2180== Command: ./a.out ==2180== --2180-- warning: L3 cache found, using its data for the LL simulation. took 64.657681 sec ==2180== ==2180== I refs: 5,281,474,482 ==2180== I1 misses: 1,710 ==2180== LLi misses: 1,651 ==2180== I1 miss rate: 0.00% ==2180== LLi miss rate: 0.00% ==2180== ==2180== D refs: 3,170,495,683 (1,840,363,429 rd + 1,330,132,254 wr) ==2180== D1 misses: 12,055 ( 10,374 rd + 1,681 wr) ==2180== LLd misses: 7,383 ( 6,132 rd + 1,251 wr) ==2180== D1 miss rate: 0.0% ( 0.0% + 0.0% ) ==2180== LLd miss rate: 0.0% ( 0.0% + 0.0% ) ==2180== ==2180== LL refs: 13,765 ( 12,084 rd + 1,681 wr) ==2180== LL misses: 9,034 ( 7,783 rd + 1,251 wr) ==2180== LL miss rate: 0.0% ( 0.0% + 0.0% )
For an optimized version of C ++
$ valgrind --tool=cachegrind ./a.out ==2157== Cachegrind, a cache and branch-prediction profiler ==2157== Copyright (C) 2002-2013, and GNU GPL'd, by Nicholas Nethercote et al. ==2157== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info ==2157== Command: ./a.out ==2157== --2157-- warning: L3 cache found, using its data for the LL simulation. took 9.419447 sec ==2157== ==2157== I refs: 1,451,459,660 ==2157== I1 misses: 1,599 ==2157== LLi misses: 1,549 ==2157== I1 miss rate: 0.00% ==2157== LLi miss rate: 0.00% ==2157== ==2157== D refs: 430,486,197 (340,358,108 rd + 90,128,089 wr) ==2157== D1 misses: 12,008 ( 10,337 rd + 1,671 wr) ==2157== LLd misses: 7,372 ( 6,120 rd + 1,252 wr) ==2157== D1 miss rate: 0.0% ( 0.0% + 0.0% ) ==2157== LLd miss rate: 0.0% ( 0.0% + 0.0% ) ==2157== ==2157== LL refs: 13,607 ( 11,936 rd + 1,671 wr) ==2157== LL misses: 8,921 ( 7,669 rd + 1,252 wr) ==2157== LL miss rate: 0.0% ( 0.0% + 0.0% )