Double vs. float

I’m writing Julia code lately, and the default floating point type is usually Float64, the equivalent of a double in C. I am wondering what people think about this. 32-bit floating point as a default was far more sensible when Radiance was written, but with many (all?) modern processors being 64 bit and memories being much larger, does 32-bit even make sense any more, except when dealing with huge amounts of data or embedded processors?

What do people think? Greg, if you were writing Radiance these days, would you still use float?

It always pays in my view to use the appropriate size floating-point in whatever structures you design. Even if there is little savings in computation time with 32-bit floats, you will save memory that might be needed for other things if the precision is sufficient.

That said, it never made sense to go from double to float in a routine just to perform a few calculations. Function arguments are often promoted to double on the way in, and it makes sense to return double types for such functions.

Matrix operations and summations should be done in double precision to avoid round-off error accumulation, which is why I use doubles in rmtxop, even though it does take quite a bit of memory for large matrices.

A good example of the difference is the COLOR type, which is used throughout Radiance. There really is no need for greater precision that 32-bit float in lighting simulations, unless the color is used as an accumulator, as it is in rcontrib. There, I use a special DCOLOR type to avoid error accumulation.

Does this help?

1 Like

Thanks, yes, that helps. I think the Julia developers agree with you, by the way.

modern processors being 64 bit and memories being much larger, does 32-bit even make sense any more

Note that processors being 64-bit vs. 32-bit has to do with the memory address size. Floating-point numbers are data, and so are completely separate.

In my view, float should be the “default” floating-point format. It is obviously not as precise as double, but it’s more precise than most people think, and (far) more than adequate for almost any purpose.

The more significant reason is that all computation is memory-bound: double arithmetic is usually slightly slower than float arithmetic, but it’s a question of a few cycles (i.e. fractions of nanoseconds) at most. However, doubling memory traffic to RAM hurts. Even worse is the cache hit: you can store only half as many numbers.

A reasonable approach is to load/store data as float but do compute-heavy arithmetic in double when numerical analysis suggests this is important (though it usually won’t be.)

Hi Agatha,

I agree in general that float is preferred to double, especially when it’s taking up a lot of space. The decision to use double’s was based on numerous tests I performed early on, where I found the extra precision was necessary to avoid cracks appearing in geometry, and accumulated floating-point errors during ray-tracing.

If you want to see what happens or do performance tests using 32-bit floats, you can compile the system with -DSMLFLT, which changes the definition of RREAL in common/fvect.h, affecting much of the code.