Float vs. integer:
Historically, floating point can be much slower than integer arithmetic. On modern computers, this is no longer the case (on some platforms it is somewhat slower, but if you do not write the perfect code and do not optimize for each cycle, the difference will depend on the other inefficiencies of your code).
On a few limited processors, for example, on high-end mobile phones, floating point can be somewhat slower than an integer, but usually it is an order of magnitude (or better) if there is a hardware floating point, It is worth noting that this gap closes quite quickly, as cell phones are designed to run more and more common computing workloads.
On very limited processors (cheap cell phones and your toaster), as a rule, there is no floating point equipment, so floating point operations need to be emulated in software. This is slow - a couple of orders slower than a whole arithmetic.
As I said, people expect their phones and other devices to behave more and more like “real computers,” and hardware developers are rapidly building up FPUs to meet this demand. If you do not pursue every last cycle, or you write code for very limited processors that have little or no floating point support, the performance difference does not matter to you.
Different types of integer:
As a rule, processors work most quickly on integers of their own word size (with some reservations about 64-bit systems). 32-bit operations are often faster than 8- or 16-bit operations on modern processors, but this is slightly different between architectures. Also, remember that you cannot consider processor speed separately; it is part of a complex system. Even if working on 16-bit numbers is 2 times slower than working on 32-bit numbers, you can fit twice as much data into the cache hierarchy when you present it with 16-bit numbers instead of 32-bit numbers. If this makes the difference between the fact that all your data comes from the cache rather than frequent misses in the cache, faster access to the memory will lead to a slower processor.
Other notes:
Vectorization tells the balance further in favor of narrower types ( float and 8- and 16-bit integers) - you can do more operations in a vector of the same width. However, good vector code is hard to write, so it’s not as if you had this advantage without a lot of careful work.
Why are there differences in performance?
In fact, there are only two factors that influence whether an operation on a processor is performed quickly: the complexity of the circuit and the user's need for fast operation.
(For a reason) any operation can be performed quickly if chip designers are willing to throw enough transistors into the problem. But transistors cost money (more precisely, using a large number of transistors increases your chip, which means that you get fewer chips per wafer and lower profitability, which costs money), so chip developers must balance how difficult it is to use, for which operations and they do this based on (perceived) consumer demand. Roughly speaking, you might consider breaking down operations into four categories:
high demand low demand high complexity FP add, multiply division low complexity integer add popcount, hcf boolean ops, shifts
operations with a high degree of demand and low complexity will be performed on almost any processor: they are low-potential fruits and give maximum benefit for each transistor.
Operations with a high degree of availability, high complexity will quickly work on expensive processors (for example, computers), since users are willing to pay for them. You probably won’t want to pay an extra $ 3 for a toaster to quickly copy the FP, however cheap processors will avoid these instructions.
operations with low demand, high complexity, as a rule, will be slow for almost all processors; there is simply not enough profit to justify the cost.
operations with low demand, low complexity will be quick if someone is worried about them and does not exist at all.
Further reading:
- Agner Fog maintains a pleasant site with a lot of discussion of low-level performance characteristics (and has a very scientific methodology for collecting data for its backup).
- The Intel® 64 and IA-32 Architecture Optimization Reference Guide (a PDF download link is part of the path down the page) also covers many of these issues, although it focuses on one specific architecture family.