Double-swimming overhead castings?

Question

Double-swimming overhead castings?

So, I have megabytes of data that are stored as doubles that need to be sent over the network ... now I do not need the accuracy that double offers, so I want to convert them to float before sending them over the network. What are the overheads simply:

float myFloat = (float)myDouble;

I will do this operation several million times every few seconds and do not want to slow anything down. Thanks

EDIT: My platform is x64 with Visual Studio 2008.

EDIT 2: I cannot control how they are stored.

+6

c ++ c

Polaris878 Sep 23 '09 at 16:13

source share

8 answers

This will depend on your implementation of the C ++ libraries. Check it out and see.

+9

Patrick Sep 23 '09 at 16:15

source share

Even if it takes time, it will not be a slow point in your application.
Your FPU can do the conversion much faster than it can send network traffic (so the most likely bottleneck here is writing to the socket).

But, as in all such things as this measure, this is what to see.

Personally, I do not believe that any time spent here will affect the real time spent sending the data.

+6

Martin york Sep 23 '09 at 17:04

source share

Taking into account that most compilers deal with doubles much more efficiently than floats - many advance float to double before performing operations on them - I would think about accepting a data block, ZIPping / compression, and then forwarding the compressed block. Depending on how your data looks, you can get compression of 60-90%, and 50% you will get the conversion of 8-byte values to four bytes.

+3

Bob kaufman Sep 23 '09 at 16:35

source share

Assuming you are talking about a significant number of packets to send data (a reasonable assumption if you send millions of values), casting doubles to a float is likely to reduce the number of network packets by about half (provided that sizeof(double)==8 and sizeof(float)==4 ).

Almost certainly, the savings in network traffic will dominate regardless of the time taken to complete the conversion. But, as everyone says, measuring some tests will be proof of pudding.

+3

Michael burr Sep 23 '09 at 18:31

source share

You have no choice but to measure them yourself. You can use timers to measure them. It looks like some have already implemented the neat C ++ timer class

+2

Ashwin Sep 23 '09 at 16:26

source share

I think that this actor is much cheaper than you think, because in fact he does not imply any calculations. In fact, it’s just a bitbifting to get rid of some indicator digits and mantissa.

+2

Erich kitzmueller Sep 23 '09 at 20:33

source share

It will also depend on the processor and what it supports floating point. In the bad old days (1980s), processors only supported entire operations. The floating point math needed to be emulated in software. A separate floating point chip (a coprocessor ) can be purchased separately.

Modern processors now have SIMD , so you can immediately process a large amount of floating point data. These instructions include MMX, SSE, 3DNow! etc. Your compiler may know how to use these instructions, but you may need to write your code in a certain way and include the necessary parameters.

Finally, the fastest way to process floating point data is in the graphics card. A fairly new language called OpenCL allows you to send jobs to a graphics card that needs to be processed there.

It all depends on what kind of performance you need.

+1

Kevin panko Sep 23 '09 at 16:41

source share

peterchen · Accepted Answer · 2009-09-23T20:22:00+0000

As Michael Barr said, although the overhead is highly dependent on your platform, the overhead is definitely less than the time it takes to send them by cable.

rough estimate:

800 Mbit / s payload on excellent gigabit wire, 25 M-floats / second.

On one core of 2 GHz, which gives you a whopping 80 clock cycles for each value converted to breakeven, it’s still less and you will save time. This should be more than enough for all architectures :)

A simple load storage cycle (prohibition of all cache delays) should be below 5 cycles per value. With alternating instructions, SIMD extensions and / or parallelization across multiple cores, you are likely to perform multiple conversions in a single loop.

In addition, the recipient will be happy to process only half of the data. Remember that memory access time is non-linear.

The only thing that can be against conversion is that the transfer should have a minimum load on the processor: modern architecture can transfer data from disk / memory to the bus without CPU intervention. However, with the above figures, I would say that in practice this does not matter.

[edit]
I checked some numbers, the 385 coprocessor would really take about 70 cycles for the load-store cycle. In the original pentium, you carry three cycles without any parallelization.

So, if you did not run the 386 gigabit network ...

Double-swimming overhead castings?

More articles: