Reuse shodow test calculations for virtual sources

Mostapha · July 28, 2019, 9:47pm

Hello everyone!

I have a scene with more than 26,000 sources representing suns and 80 mirror like faces which work as virtual sources. The study includes more than 100,000 sensor points. In most cases with several sensors I would break down the sensors into smaller pieces and run them in parallel but it doesn’t work as well in this case.

Based on what I can see in source code and also timing the runs my understanding is that when I run an rtrace / rontrib command with -dr 1 Radiance does calculate the shadow testing for the first sensor and then reuses the calculation for the rest of the sensors. As result when I run the study for 1 sensor it takes 10 minutes, for 100 sensors it is 12 minutes and for 200 sensors it is 15 minutes. 10 minutes for the initial calculation and then ~2 minutes for each 100 sensors.

In other words, the additional overhead for the pre-calculation is more than the benefits of parallel processing. I was hoping that I can share the shadow-testing calculations between the runs similar to how -af filename shares ambient calculation between the runs for the same scene. Is it possible to do? and if not is there a workaround to achieve this?

I found this page on Radiance website which is inline with what I’m trying to do but I’m not sure how to add it to or use it with Radiance: Direct Cache Manual.

Greg_Ward · July 29, 2019, 9:47pm

Using either rtrace or rcontrib with the -n option, virtual sources are shared between processes, at least under Unix. All set-up happens before fork() is called. Multiprocessing isn’t well-supported under Windows due to the lack of a fork() function or any suitable replacement for sharing memory. There is no external mechanism for sharing source calculation set-up, unfortunately.
I know Carsten Bauer created the direct cache (among other optimizations), but it was never reincorporated into the main distribution. I don’t know if he still supports Radzilla, or what became of it. He gave a presentation at the 2004 Radiance workshop on his developments.

Mostapha · August 3, 2019, 1:06am

Hi Greg! Based on the website Radzilla is not maintained anymore and is not available. Thank you for confirming that there is no solution for now except for using -n option. That would work but it is not as scalable as we need it to be.

Related to fork(), and I know this is not an easy one but multiprocessing library in Python is using freeze_support method to emulate forking on Windows. It basically passes data from parent process to newly created process using pipe. Here is a short writing about it:

and here is the source code:

github.com

python/cpython/blob/17a5588740b3d126d546ad1a13bdac4e028e6d50/Lib/multiprocessing/spawn.py#L62


def is_forking(argv):
    '''
    Return whether commandline indicates we are forking
    '''
    if len(argv) >= 2 and argv[1] == '--multiprocessing-fork':
        return True
    else:
        return False




def freeze_support():
    '''
    Run code for process object if this in not the main process
    '''
    if is_forking(sys.argv):
        kwds = {}
        for arg in sys.argv[2:]:
            name, value = arg.split('=')
            if value == 'None':
                kwds[name] = None
            else:

Greg_Ward · August 3, 2019, 4:13pm

It would be cool if there was a mechanism to copy a process under Windows. Has anyone implemented a stable Windows replacement for fork() using this method? I don’t know much about Python, or what this code does, exactly. Looks like it’s just getting and interpreting argument variables, unless I’m not seeing all of it. The web page says something about a module called “pickle,” which I assume is a Python add-on, and not much good to those of us stuck in the C world.

Why is the -n option not scalable enough for you? Is it because you are under Windows, or because you are trying to spread processes across machines?

Mostapha · August 4, 2019, 10:08pm

It’s the second one. The -n option is great for a single powerful machine (e.g. scaling vertically) but I was looking for a solution that can scale horizontally where we can use 100s of machines with 2 CPUs.

Greg_Ward · August 5, 2019, 12:27am

Well, given the fact that not that many folks use the virtual source calculation, and this is sort of a special case within that usage, I don’t see making an external format for what amounts to a fairly complex data structure to share this information across machines.

There is something called “checkpointing” that takes a copy of a process that you can move between machines and restart, but I don’t know if there are any working versions of this available.

Georg_Mischler · August 5, 2019, 6:29am

“pickle” is a data serialization module and format in python. It can be used to pass structured (python) data between processes, eg via a pipe. Your guess is correct: not helpful in C land.

To emulate fork-type data sharing, you could use one of the existing shared memory mechanisms on Windows. That shouldn’t be too hard, as long as you don’t need copy-on-write semantics.

Cheers
-schorsch

Mostapha · August 6, 2019, 12:30am

I missed this part. This discussion seems to be asking about a similar problem and there are two links that discuss how it can be done in C. It refers to these two links:

and