Rendering with rtpict with high number of processors

I am attempting to run the below rtpict command on a google cloud VM instance with 192 CPUs. The command successfully runs. The problem I’m running into is that the max number of parallel process I can observe is 64.

rtpict -n 192 -t 1 -vf inputs/image10.vp -x 256 -y 256 @inputs/image10.rdp -af outputs/image/image10_high.amb inputs/image10.oct > outputs/image/image10_high.hdr

I believe it may relate to the MAX_NPROCS set at raypcalls.c:157

I’m hoping to avoid modifying the source code myself and recompiling to prevent adding any further complexity to the rendering workflows I’ve created, and distribution to other team members.

What would you suggest is the best way around this limit? or is there another source of the limit on the number of processes that could be occurring when using rtpict?

Thanks,
Vin

Hi Vin,

There are usually diminishing returns as you increase the number of ray-tracing processes. Are you seeing near-linear speed-ups out to 64 processes, or are you just trying to push things to see how far you can go? At some point, the overhead of talking to all those proceses becomes the major bottleneck, if you don’t hit a problem earlier with ambient cache sharing or the like.

In any case, you are welcome to add -DMAX_NPROCS=100 or whatever you want to try. (Be sure to get the latest HEAD source though, because I just noticed a typo in the macros surrounding that setting.) There is a system-imposed limit as you can see based on the number of file descriptors handled by the select() call, which we have no control over.

Cheers,
-Greg

P.S. I should add that for a single output image, rxpiece might be the better choice as it does not rely on raypcalls nor does it have the same controlling-process bottleneck.

Hi Greg,

I haven’t tried modifying this number to 100. I’m running into some bottlenecks.

Upon the initial run below all 64 processors appear to run at 100% until complete. Upon re-run at a higher resolution, the processes must be fighting for access to the ambient file as the average load on the system sits around 10% until the simulation is complete.

Any advice on machine configuration or IOPS or disk type that would reduce this bottleneck on the ambient file? I’m currently using google clouds compute engine, so have a high amount of flexibility on what I can select.

Or would it be more effective all together on the second re-run copy the ambient file so that each rtrace process has its own ambient file to work with.

IMAGE_NAME="image1_shg_12ab"
RES=$((1024))
 rtpict -n 19 -vf inputs/view.vp -x $RES -y $RES @inputs/${IMAGE_NAME}.rdp -af outputs/image/${IMAGE_NAME}.amb inputs/model.oct > outputs/image/${IMAGE_NAME}.hdr

rendering parameters input to rtpict are

rtrace -u- -dt .05 -dc .5 -ds .25 -dr 1 -aa .2 -ar 64 -ad 512 -as 128 -lr 7 -lw 1e-04 -dj 0.70 -ds 0.15 -dt 0.00 -dc 1.00 -dr 3 -dp 512 -st 0.15 -ab 8 -aa 0.10 -ar 1024 -ad 4096 -as 1024 -lr 16 -lw 0.00001 -i 

These are the specs of the virtual machine disk i’m using.

300 GB Hyperdisk balanced boot disk currently has a default of 4,800 IOPS and 140 MiB/s throughput.

Thanks,
Vin

Hi Vin,

Again, you might try using rxpiece if that will work for you, as it overcomes some of the ambient file contention and process monitoring overheads in rtpict. Also, rtpict makes it harder to figure out where the bottlenecks are, as it may employ sort to rearrange the pixels to avoid ambient file contention, and this is a serial post-process that can be significant for large images.

Regarding the virtual machine, I would strongly recommend putting all the local files, especially the shared ambient, on a local SSD. You don’t want this data going back and forth across a network if you can avoid it.

Cheers,
-Greg