BSDF in child processes' memory

Dear Radiance mailing-list and website subscribers,

I am currently running some rather extreme case of a scene employing BSDF. Three rank 4, 2^7x2^7 tensors are loaded, and I run 40 rcontrib processes in parallel. From my understanding, since the BSDF does not change during the rendering, the processes’ memory footprint should be light. However, after several hours, the processes have grown large enough to start the system swapping to disk. This is on a linux system with 64 GB. In some cases, rcontrib dies with “rcontrib: system - read error from render process”.

What could make the processes create such massive amounts of data? Is it the caching or the cumulation to generate the pdf that modifies memory and thereby prevents using shared pages?

Besides, does Radiance detect if multiple material definitions reference the same XML file, or are these loaded separately for each definition of a BSDF type?

Cheers, Lars.

Like other data, BSDF (XML files) are preloaded if you use -n >1 with rcontrib. It sounds like something may be going wrong with the queuing functions. Do you really have 40 cores? Perhaps you should try running with fewer processes. It may be that one or more processes is being neglected by the OS, and is accumulating unfinished work. This would back up the ray queue and cause the FIFO memory to bloat.

I may need to put in some checks on queue size, similar to what I added recently to the rayfifo.c routines employed by rtrace -n.

If there was no additional information after the “rcontrib: system - read error from render process" report, then it probably means that one of the rendering processes died or was killed by the OS.

Cheers,
-Greg

Now that I think about it, I don’t have much experience working with anisotropic tensor trees this large. You could be right about the probability distribution data taking up all your memory. If all of your child processes are growing in size, then this is the most likely cause. Unfortunately, there’s no easy way to share this memory between processes, as it’s built up on an as-needed basis during rendering.

Your only solution if this is the case may be to run rcontrib with fewer processes, so it all fits in memory.

-G

Hi Greg,

thank you for looking into this. Well, I have 20 cores, and with
hyper-threading running 40 processes works rather fine in general. If I
understand this correctly, the problem would be to share the PDF tables

  • with all the file-locking trouble just like the ambient file sharing?

I will see if the new attempt leads to some result until tomorrow,
otherwise I have to reduce the number of processes. Would certainly also
increase thermal comfort in my office.

Cheers, Lars.

There’s no point in sharing the PDF cache the way we share ambient values, because it’s faster just to recompute them. Also, ambient sharing still duplicates private memory, so it wouldn’t help in your situation, anyway. I’m afraid your only solution may be to reduce the number of processes.

If you restrict the number of processes to the number of cores, you won’t lose much efficiency, and you’ll only need half as much private memory.

Ultimately, it might be worthwhile to implement an LRU (least recently used) PDF ccache clearing algorithm, but it’s actually difficult to detect thrashing and even memory usage in a portable fashion…

-Greg