Double vs. float

I’m writing Julia code lately, and the default floating point type is usually Float64, the equivalent of a double in C. I am wondering what people think about this. 32-bit floating point as a default was far more sensible when Radiance was written, but with many (all?) modern processors being 64 bit and memories being much larger, does 32-bit even make sense any more, except when dealing with huge amounts of data or embedded processors?

What do people think? Greg, if you were writing Radiance these days, would you still use float?

It always pays in my view to use the appropriate size floating-point in whatever structures you design. Even if there is little savings in computation time with 32-bit floats, you will save memory that might be needed for other things if the precision is sufficient.

That said, it never made sense to go from double to float in a routine just to perform a few calculations. Function arguments are often promoted to double on the way in, and it makes sense to return double types for such functions.

Matrix operations and summations should be done in double precision to avoid round-off error accumulation, which is why I use doubles in rmtxop, even though it does take quite a bit of memory for large matrices.

A good example of the difference is the COLOR type, which is used throughout Radiance. There really is no need for greater precision that 32-bit float in lighting simulations, unless the color is used as an accumulator, as it is in rcontrib. There, I use a special DCOLOR type to avoid error accumulation.

Does this help?
-Greg

1 Like

Thanks, yes, that helps. I think the Julia developers agree with you, by the way.

modern processors being 64 bit and memories being much larger, does 32-bit even make sense any more

Note that processors being 64-bit vs. 32-bit has to do with the memory address size. Floating-point numbers are data, and so are completely separate.


In my view, float should be the “default” floating-point format. It is obviously not as precise as double, but it’s more precise than most people think, and (far) more than adequate for almost any purpose.

The more significant reason is that all computation is memory-bound: double arithmetic is usually slightly slower than float arithmetic, but it’s a question of a few cycles (i.e. fractions of nanoseconds) at most. However, doubling memory traffic to RAM hurts. Even worse is the cache hit: you can store only half as many numbers.

A reasonable approach is to load/store data as float but do compute-heavy arithmetic in double when numerical analysis suggests this is important (though it usually won’t be.)

Hi Agatha,

I agree in general that float is preferred to double, especially when it’s taking up a lot of space. The decision to use double’s was based on numerous tests I performed early on, where I found the extra precision was necessary to avoid cracks appearing in geometry, and accumulated floating-point errors during ray-tracing.

If you want to see what happens or do performance tests using 32-bit floats, you can compile the system with -DSMLFLT, which changes the definition of RREAL in common/fvect.h, affecting much of the code.

Cheers,
-Greg

A follow-up to this for anyone monitoring the thread; I recently changed the default RMATRIX struct to use 32-bit float rather than double by default. To get back the original representation, you now need to add this to the rmake compile flags:

-DDTrmx_native=DTdouble

My reasoning was two-fold. First, most matrix operations used in Radiance are not particularly sensitive to accumulation errors, since we are not doing SVD or anything complicated, just matrix multiplies and adds for the most part. Second, it is more reasonable to store 32-bit float matrix files, and these can be memory-mapped by the routines, this further reduces load times in many cases.

This mainly affects the rmtxop command, which now uses half as much memory for most operations. I also wrote an in-situ transpose algorithm, which saves additional memory when the -t flag is applied.

Best,
-Greg