This was true in the old days when the size of the memory bus was the same as the size of the processor register. But not so long ago, Pentium was the first processor that you will find on standard hardware, where the size of the memory bus has become larger, 64-bit for a 32-bit processor. An easy way to increase bus throughput.
Memory is a very large bottleneck, it is much slower than the processor core. The problem is the distance, the farther the electrical signal has to go, the more difficult it will be to switch the signal at high frequency without distorting the signal.
Accordingly, the size of the processor caches, as well as the efficiency with which the program can use them, strongly determine the speed of the program. Skipping a cache can easily get by with cpu stop cycles.
Your 64-bit processor did not have a double cache size, L1 is still 32 KB and 32 KB of data, regardless of whether your program is running in 32-bit or 64-bit mode. The available space on the chip and, most importantly, the distance between the cache and the execution mechanism are physical limitations, determined by the size of the function of the technological technology.
Thus, creating a 64-bit int, which is very simple for the compiler, is very harmful for the speed of the program. Such a program uses caches much less efficiently and will suffer from many other kiosks, waiting for the memory bus.
The dominant data models for 64-bit are LLP64, a choice made by Microsoft, and LP64, a choice made on * nix operating systems. Both use 32-bit for int, LLP64 uses 32-bit length, LP64 makes it 64-bit. Long long 64-bit on both.