Dctimestep performance

Nathaniel_Jones · June 28, 2018, 3:34am

I’m trying to eke out the maximum performance I can get from dctimestep. The case is for annual simulation on ~6000 points and 5185 sky patches. I am using a SSD and -of binary output to speed up disc access.

I notice in the task manager that dctimestep is constantly allocating more memory during its run. I thought that if all the input files specified nrows and ncols, it would be able to allocate the memory just once. Is there something I could do here to speed it up?

Does anyone have other tips or tricks for speeding up dctimestep?

Thanks,

Nathaniel

German_Molina · June 28, 2018, 4:29am

Hello Nathaniel,

What I did a few months ago was to combine dctimestep with gendaymtx. This was something very custom, so it will not be useful to you; however, what I did was to, for each timestep:

Alocate space for the results (just in doubles, not in RGB channel… so, illuminance)
Check if it is day. If not, fill with Zeroes and continue.
Provided that it is day, calculate the daymtx
Multiply, putting the result DIRECTLY in the resulting matrix, by internally transforming the RGB into Illuminance

This managed to get us quick(ish) results… at least much quicker than we expected.

Later, I did something similar within another project of mine. Check THIS function to have a look.

Cheers

Greg_Ward · June 28, 2018, 4:40am

If your inputs all have dimensions, there shouldn’t be any repeat allocations in dctimestep unless you are putting out images, in which case each image generates one call to malloc() and a corresponding call to free(). These should not take significant time. Did you look at the call tree to try to figure out where these calls were happening?

There are several tricks built into dctimestep at this point, which look for all-zero rows and columns, as are often found in daylight matrices, and skip multiplying those. You can of course load everything onto a graphics card and do it all in parallel. That’s what an intern at LBNL did at some point, but we didn’t get much benefit because i/o ended up being the bottleneck.