genBSDF multiple processes

I'm getting some curious (to me) behavior when running genBSDF to calculate
a rank 4 tensor tree (It also happens with '-t3'). When the sampling
resolution is 4 or 5 the program utilizes most of my processing (I notice
it starts 9 rcontrrib processes an 1 rfluxmtx process with '-n 8', so the
threads are each around 80%). When I change the resolution to 6 (without
changing anything else) the processors top out at 25% and if I increase -n
it still uses no more than ~25% of my total cpu power. There is plenty of
unused memory.

here is my command:
genBSDF -n 8 -c 10240 -f +b -r '-ab 10 -ad 1 -ss 0 -st .02' -t4 5 -dim 0
2.673 0 2.706 -1.207 0 -geom millimeter test.rad > test.xml

Does anyone know what is going on?
Is there something in the genBSDF program limiting this?
Is it a hardware limitation?
Is it some environmental variable (im running mac OSX 10.9.5)?

Thanks,

Stephen Wasilewski
*LOISOS *+* UBBELOHDE*
- - - - - - - - - - - - - - - - - - - - - - - - - - -
1917 Clement Avenue Building 10A
Alameda, CA 94501 USA
- - - - - - - - - - - - - - - - - - - - - - - - - - -
510 521 3800 VOICE
510 521 3820 FAX
- - - - - - - - - - - - - - - - - - - - - - - - - - -
www.coolshadow.com

Hi Stephen,

The rcontrib command underpinning rfluxmtx and therefore genBSDF can become i/o bound if the model is not particularly complicated and/or there are few non-specular rays being generated. If your system is purely specular, this is the sort of behavior I might expect. As the number of receiving bins increases from -t4 5 to -t4 6, you go from 1024 output directions per input direction to 4096. If you leave your sampling (-c) parameter at 2000, then you're actually sending fewer samples per incident direction than you have output bins. This might be OK, depending on your model, but it means that the number of bin results being sent from one process to another exceeds the number of rays being calculated, thus i/o becomes the bottleneck over CPU if you have enough processes. In a sense, you could increase your -c setting by a factor of 2 or 3 and get better accuracy in the same calculation time. Try it and see if it doesn't improve your CPU utilization.

Remember that if your number of processes exceeds the number of physical cores (not virtual ones), then your time-linearity will go down dramatically, even if your CPU utilization shows 100%.

Cheers,
-Greg

···

From: Stephen Wasilewski <[email protected]>
Date: October 4, 2016 6:14:24 PM PDT

I'm getting some curious (to me) behavior when running genBSDF to calculate a rank 4 tensor tree (It also happens with '-t3'). When the sampling resolution is 4 or 5 the program utilizes most of my processing (I notice it starts 9 rcontrrib processes an 1 rfluxmtx process with '-n 8', so the threads are each around 80%). When I change the resolution to 6 (without changing anything else) the processors top out at 25% and if I increase -n it still uses no more than ~25% of my total cpu power. There is plenty of unused memory.

here is my command:
genBSDF -n 8 -c 10240 -f +b -r '-ab 10 -ad 1 -ss 0 -st .02' -t4 5 -dim 0 2.673 0 2.706 -1.207 0 -geom millimeter test.rad > test.xml

Does anyone know what is going on?
Is there something in the genBSDF program limiting this?
Is it a hardware limitation?
Is it some environmental variable (im running mac OSX 10.9.5)?

Thanks,

Stephen Wasilewski
LOISOS + UBBELOHDE
- - - - - - - - - - - - - - - - - - - - - - - - - - -
1917 Clement Avenue Building 10A
Alameda, CA 94501 USA
- - - - - - - - - - - - - - - - - - - - - - - - - - -
510 521 3800 VOICE
510 521 3820 FAX
- - - - - - - - - - - - - - - - - - - - - - - - - - -
www.coolshadow.com

Thanks Greg,

That makes sense, although it seems there might be something else going on
as well.

my model is purely specular and has no transmission only reflection. I ran
through a number of -c options last night with '-t4 6':

-c 10240 (2.5 samples) took ~ 20 minutes
-c 40960 (10 samples) took ~ 1.5 hrs
-c 102400 (25 samples) took ~ 3.75 hrs

so it seems calculation time is scaling linear with my count.

I have a large number of polygons (the rad file is 31 MB) and my material
is a mirror with a brightfunc modifier (the glass_angular_effect you shared
a number of years ago. does that introduce some other bottleneck?

Stephen Wasilewski
*LOISOS *+* UBBELOHDE*
- - - - - - - - - - - - - - - - - - - - - - - - - - -
1917 Clement Avenue Building 10A
Alameda, CA 94501 USA
- - - - - - - - - - - - - - - - - - - - - - - - - - -
510 521 3800 VOICE
510 521 3820 FAX
- - - - - - - - - - - - - - - - - - - - - - - - - - -
www.coolshadow.com

···

On Tue, Oct 4, 2016 at 7:07 PM, Greg Ward <[email protected]> wrote:

Hi Stephen,

The rcontrib command underpinning rfluxmtx and therefore genBSDF can
become i/o bound if the model is not particularly complicated and/or there
are few non-specular rays being generated. If your system is purely
specular, this is the sort of behavior I might expect. As the number of
receiving bins increases from -t4 5 to -t4 6, you go from 1024 output
directions per input direction to 4096. If you leave your sampling (-c)
parameter at 2000, then you're actually sending fewer samples per incident
direction than you have output bins. This might be OK, depending on your
model, but it means that the number of bin results being sent from one
process to another exceeds the number of rays being calculated, thus i/o
becomes the bottleneck over CPU if you have enough processes. In a sense,
you could increase your -c setting by a factor of 2 or 3 and get better
accuracy in the same calculation time. Try it and see if it doesn't
improve your CPU utilization.

Remember that if your number of processes exceeds the number of physical
cores (not virtual ones), then your time-linearity will go down
dramatically, even if your CPU utilization shows 100%.

Cheers,
-Greg

*From: *Stephen Wasilewski <[email protected]>

*Date: *October 4, 2016 6:14:24 PM PDT

I'm getting some curious (to me) behavior when running genBSDF to
calculate a rank 4 tensor tree (It also happens with '-t3'). When the
sampling resolution is 4 or 5 the program utilizes most of my processing (I
notice it starts 9 rcontrrib processes an 1 rfluxmtx process with '-n 8',
so the threads are each around 80%). When I change the resolution to 6
(without changing anything else) the processors top out at 25% and if I
increase -n it still uses no more than ~25% of my total cpu power. There
is plenty of unused memory.

here is my command:
genBSDF -n 8 -c 10240 -f +b -r '-ab 10 -ad 1 -ss 0 -st .02' -t4 5 -dim 0
2.673 0 2.706 -1.207 0 -geom millimeter test.rad > test.xml

Does anyone know what is going on?
Is there something in the genBSDF program limiting this?
Is it a hardware limitation?
Is it some environmental variable (im running mac OSX 10.9.5)?

Thanks,

Stephen Wasilewski
*LOISOS *+* UBBELOHDE*
- - - - - - - - - - - - - - - - - - - - - - - - - - -
1917 Clement Avenue Building 10A
Alameda, CA 94501 USA
- - - - - - - - - - - - - - - - - - - - - - - - - - -
510 521 3800 VOICE
510 521 3820 FAX
- - - - - - - - - - - - - - - - - - - - - - - - - - -
www.coolshadow.com

_______________________________________________
Radiance-general mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-general

Hi Stephen,

Generally speaking, I would expect an increase in the -c option to improve CPU utilization. As you note, you are seeing more of a linear relationship, which implies that the utilization is staying the same. Did you check this in your process monitor? You can run the "Sample Process" option to see further where the program is spending most of its time, though I'm not sure what we might learn from this in terms of i/o bottlenecks.

If your materials are purely specular (zero roughness), then your ability to parallelize may be fundamentally limited by the low ray branching. All the same, the run-times should be acceptable, so I'm not too concerned about optimizing for such cases.

-Greg

···

From: Stephen Wasilewski <[email protected]>
Date: October 5, 2016 10:25:16 AM PDT

Thanks Greg,

That makes sense, although it seems there might be something else going on as well.

my model is purely specular and has no transmission only reflection. I ran through a number of -c options last night with '-t4 6':

-c 10240 (2.5 samples) took ~ 20 minutes
-c 40960 (10 samples) took ~ 1.5 hrs
-c 102400 (25 samples) took ~ 3.75 hrs

so it seems calculation time is scaling linear with my count.

I have a large number of polygons (the rad file is 31 MB) and my material is a mirror with a brightfunc modifier (the glass_angular_effect you shared a number of years ago. does that introduce some other bottleneck?

Stephen Wasilewski
LOISOS + UBBELOHDE
- - - - - - - - - - - - - - - - - - - - - - - - - - -
1917 Clement Avenue Building 10A
Alameda, CA 94501 USA
- - - - - - - - - - - - - - - - - - - - - - - - - - -
510 521 3800 VOICE
510 521 3820 FAX
- - - - - - - - - - - - - - - - - - - - - - - - - - -
www.coolshadow.com

On Tue, Oct 4, 2016 at 7:07 PM, Greg Ward <[email protected]> wrote:
Hi Stephen,

The rcontrib command underpinning rfluxmtx and therefore genBSDF can become i/o bound if the model is not particularly complicated and/or there are few non-specular rays being generated. If your system is purely specular, this is the sort of behavior I might expect. As the number of receiving bins increases from -t4 5 to -t4 6, you go from 1024 output directions per input direction to 4096. If you leave your sampling (-c) parameter at 2000, then you're actually sending fewer samples per incident direction than you have output bins. This might be OK, depending on your model, but it means that the number of bin results being sent from one process to another exceeds the number of rays being calculated, thus i/o becomes the bottleneck over CPU if you have enough processes. In a sense, you could increase your -c setting by a factor of 2 or 3 and get better accuracy in the same calculation time. Try it and see if it doesn't improve your CPU utilization.

Remember that if your number of processes exceeds the number of physical cores (not virtual ones), then your time-linearity will go down dramatically, even if your CPU utilization shows 100%.

Cheers,
-Greg

From: Stephen Wasilewski <[email protected]>
Date: October 4, 2016 6:14:24 PM PDT

I'm getting some curious (to me) behavior when running genBSDF to calculate a rank 4 tensor tree (It also happens with '-t3'). When the sampling resolution is 4 or 5 the program utilizes most of my processing (I notice it starts 9 rcontrrib processes an 1 rfluxmtx process with '-n 8', so the threads are each around 80%). When I change the resolution to 6 (without changing anything else) the processors top out at 25% and if I increase -n it still uses no more than ~25% of my total cpu power. There is plenty of unused memory.

here is my command:
genBSDF -n 8 -c 10240 -f +b -r '-ab 10 -ad 1 -ss 0 -st .02' -t4 5 -dim 0 2.673 0 2.706 -1.207 0 -geom millimeter test.rad > test.xml

Does anyone know what is going on?
Is there something in the genBSDF program limiting this?
Is it a hardware limitation?
Is it some environmental variable (im running mac OSX 10.9.5)?

Thanks,

Stephen Wasilewski
LOISOS + UBBELOHDE
- - - - - - - - - - - - - - - - - - - - - - - - - - -
1917 Clement Avenue Building 10A
Alameda, CA 94501 USA
- - - - - - - - - - - - - - - - - - - - - - - - - - -
510 521 3800 VOICE
510 521 3820 FAX
- - - - - - - - - - - - - - - - - - - - - - - - - - -
www.coolshadow.com

_______________________________________________
Radiance-general mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-general

_______________________________________________
Radiance-general mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-general

Greg,

you are correct, utilization does not change significantly. You are also
right that the run time is totally acceptable.

It is nice to understand where the limitation is and that it is only true
for this 100% specular case.

Thanks,

Stephen Wasilewski
*LOISOS *+* UBBELOHDE*
- - - - - - - - - - - - - - - - - - - - - - - - - - -
1917 Clement Avenue Building 10A
Alameda, CA 94501 USA
- - - - - - - - - - - - - - - - - - - - - - - - - - -
510 521 3800 VOICE
510 521 3820 FAX
- - - - - - - - - - - - - - - - - - - - - - - - - - -
www.coolshadow.com

···

On Wed, Oct 5, 2016 at 10:42 AM, Greg Ward <[email protected]> wrote:

Hi Stephen,

Generally speaking, I would expect an increase in the -c option to improve
CPU utilization. As you note, you are seeing more of a linear
relationship, which implies that the utilization is staying the same. Did
you check this in your process monitor? You can run the "Sample Process"
option to see further where the program is spending most of its time,
though I'm not sure what we might learn from this in terms of i/o
bottlenecks.

If your materials are purely specular (zero roughness), then your ability
to parallelize may be fundamentally limited by the low ray branching. All
the same, the run-times should be acceptable, so I'm not too concerned
about optimizing for such cases.

-Greg

*From: *Stephen Wasilewski <[email protected]>

*Date: *October 5, 2016 10:25:16 AM PDT

Thanks Greg,

That makes sense, although it seems there might be something else going on
as well.

my model is purely specular and has no transmission only reflection. I
ran through a number of -c options last night with '-t4 6':

-c 10240 (2.5 samples) took ~ 20 minutes
-c 40960 (10 samples) took ~ 1.5 hrs
-c 102400 (25 samples) took ~ 3.75 hrs

so it seems calculation time is scaling linear with my count.

I have a large number of polygons (the rad file is 31 MB) and my material
is a mirror with a brightfunc modifier (the glass_angular_effect you shared
a number of years ago. does that introduce some other bottleneck?

Stephen Wasilewski
*LOISOS *+* UBBELOHDE*
- - - - - - - - - - - - - - - - - - - - - - - - - - -
1917 Clement Avenue Building 10A
Alameda, CA 94501 USA
- - - - - - - - - - - - - - - - - - - - - - - - - - -
510 521 3800 VOICE
510 521 3820 FAX
- - - - - - - - - - - - - - - - - - - - - - - - - - -
www.coolshadow.com

On Tue, Oct 4, 2016 at 7:07 PM, Greg Ward <[email protected]> wrote:

Hi Stephen,

The rcontrib command underpinning rfluxmtx and therefore genBSDF can
become i/o bound if the model is not particularly complicated and/or there
are few non-specular rays being generated. If your system is purely
specular, this is the sort of behavior I might expect. As the number of
receiving bins increases from -t4 5 to -t4 6, you go from 1024 output
directions per input direction to 4096. If you leave your sampling (-c)
parameter at 2000, then you're actually sending fewer samples per incident
direction than you have output bins. This might be OK, depending on your
model, but it means that the number of bin results being sent from one
process to another exceeds the number of rays being calculated, thus i/o
becomes the bottleneck over CPU if you have enough processes. In a sense,
you could increase your -c setting by a factor of 2 or 3 and get better
accuracy in the same calculation time. Try it and see if it doesn't
improve your CPU utilization.

Remember that if your number of processes exceeds the number of physical
cores (not virtual ones), then your time-linearity will go down
dramatically, even if your CPU utilization shows 100%.

Cheers,
-Greg

*From: *Stephen Wasilewski <[email protected]>

*Date: *October 4, 2016 6:14:24 PM PDT

I'm getting some curious (to me) behavior when running genBSDF to
calculate a rank 4 tensor tree (It also happens with '-t3'). When the
sampling resolution is 4 or 5 the program utilizes most of my processing (I
notice it starts 9 rcontrrib processes an 1 rfluxmtx process with '-n 8',
so the threads are each around 80%). When I change the resolution to 6
(without changing anything else) the processors top out at 25% and if I
increase -n it still uses no more than ~25% of my total cpu power. There
is plenty of unused memory.

here is my command:
genBSDF -n 8 -c 10240 -f +b -r '-ab 10 -ad 1 -ss 0 -st .02' -t4 5 -dim 0
2.673 0 2.706 -1.207 0 -geom millimeter test.rad > test.xml

Does anyone know what is going on?
Is there something in the genBSDF program limiting this?
Is it a hardware limitation?
Is it some environmental variable (im running mac OSX 10.9.5)?

Thanks,

Stephen Wasilewski
*LOISOS *+* UBBELOHDE*
- - - - - - - - - - - - - - - - - - - - - - - - - - -
1917 Clement Avenue Building 10A
Alameda, CA 94501 USA
- - - - - - - - - - - - - - - - - - - - - - - - - - -
510 521 3800 VOICE
510 521 3820 FAX
- - - - - - - - - - - - - - - - - - - - - - - - - - -
www.coolshadow.com

_______________________________________________
Radiance-general mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-general

_______________________________________________
Radiance-general mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-general

_______________________________________________
Radiance-general mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-general