rpict/rpiece dies in inithemi

[discussion moved over from Radiance-general

Summary:
rpict and rpiece bomb with the "bad ray direction in inithemi"
error, optimization not (?) to blame

Longer story:
I keep encountering trouble rendering my scene with rpiece.
I will start 6 rpiece jobs and after a few hours each one
will have stopped because it encountered the "bad ray direction
in inithemi" error (rt/ambcomp.c). I am running 3.6a on Linux
on a cluster of Opterons, and compiling with gcc. I have tried
versions compiled with -O3, -O2, -O, and no optimization.

An image of the scene can be viewed here:
http://mark.technolope.org/image/p42_temp/img15e1.png
(from 8x8 rpiece, larger image in works is 24x24 pieces)

The scene contains many cones with zero radius at one end,
2000 of them, to be precise. I looked at the code in
rt/o_cone.c, but couldn't quite figure out how the RAY->ron
value is set (intersection surface normal).

The error stems from a test in rt/ambcomp.c that checks the
3 components of the ray's intersection surface normal for any
components outside of the bounds [-0.6:0.6]. I'd like, if I
may, to ask what this check accomplishes. I removed the
check and re-ran it, but I get other errors.

Is there something inherently bad with cones with zero-radius
ends? Or is the 64-bit gcc simply unreliable (and I have
to render this piece on the slower 2-proc Athlon)?

Mark

···

On Thu, 27 May 2004, Greg Ward wrote:

Hi Mark,

Yeah, I had similar problems with the gcc optimizer when I tried it.
If you used makeall to set the options rather than doing it by hand on
the command line, you should be able to find your settings in the
"rmake" script that gets put in your Radiance executables directory.
From my experiments under OS X at least, I couldn't approach your
benchmark speeds without simultaneously introducing some nefarious bug
in the renderer. I spent some time on it, but gave up trying to figure
out where the calculations had gone south. From what I've seen with
gcc problems in the past, they can be incredibly subtle. (At one
point, the optimizer was giving incorrect arithmetic results inside a
simple loop -- 2+2 = 5 and that sort of nonsense.)

I can't explain why rpict would be able to recover using the -ro option
where rpiece cannot. When code doesn't compile right, it's anybody's
guess as to what's going on.

I also don't know how to search the archives for phrases -- maybe Peter
A-B or someone familiar with our system could help with that one?

-Greg

> From: Mark Stock <[email protected]>
> Date: May 27, 2004 9:57:16 AM PDT
>
> Thanks for your quick reply, Greg. I have more information
> that may help our understanding.
>
> The other options that the rendering used are
>
> -ps 1 -ab 2 -aa 0 -ad 8 -as 0 -dj 0.7 -st 0.05
>
> so I'm guessing it's not the ambient file. It is possible that
> with all of the compiler optimizations that I used, I cut one
> too many corners.
>
> But how does this explain the fact that I could always
> "rpict -ro" and it would continue where it left off, creating
> a fine image in the end?
>
> Is there some way to find out what compile-time options
> I used? Is there a file in ray/src that contains that
> information?
>
> Also, I seem to be unable to search the radiance-online
> archives for a whole phrase. How is this done?
>
> Mark

_______________________________________________
Radiance-general mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-general

Hi Mark,

Are you using a shared ambient file? If so,
I suspect a bad NFS lock manager is the culprit.
I can't think what else might be causing this error.

-Greg

Quoting Mark Stock <[email protected]>:

···

[discussion moved over from Radiance-general

Summary:
rpict and rpiece bomb with the "bad ray direction in inithemi"
error, optimization not (?) to blame

Longer story:
I keep encountering trouble rendering my scene with rpiece.
I will start 6 rpiece jobs and after a few hours each one
will have stopped because it encountered the "bad ray direction
in inithemi" error (rt/ambcomp.c). I am running 3.6a on Linux
on a cluster of Opterons, and compiling with gcc. I have tried
versions compiled with -O3, -O2, -O, and no optimization.

An image of the scene can be viewed here:
http://mark.technolope.org/image/p42_temp/img15e1.png
(from 8x8 rpiece, larger image in works is 24x24 pieces)

The scene contains many cones with zero radius at one end,
2000 of them, to be precise. I looked at the code in
rt/o_cone.c, but couldn't quite figure out how the RAY->ron
value is set (intersection surface normal).

The error stems from a test in rt/ambcomp.c that checks the
3 components of the ray's intersection surface normal for any
components outside of the bounds [-0.6:0.6]. I'd like, if I
may, to ask what this check accomplishes. I removed the
check and re-ran it, but I get other errors.

Is there something inherently bad with cones with zero-radius
ends? Or is the 64-bit gcc simply unreliable (and I have
to render this piece on the slower 2-proc Athlon)?

Mark

Hi Mark,

Just out of curiosity, for diagnostic purposes have you tried to run just two rpiece jobs on one opteron host (assuming these are dual cpu hosts)? This might be one way to diagnose if there are compile problems versus NFS problems. I know that setting up NFS locking on Linux requires some pretty careful tailoring of the server and client configuration and mount options. I have never really been sure that I have been able to set it up properly myself.

Also what are you using to tie your "cluster" together if anything? Obviously the most basic solution is NFS with file locking and rpiece. There are other solutions such as beowolf with bproc and some kind of shared filesystem space such as GFS or others such as openMosix with MFS/DFS (I tried this at one point and encountered some problems).

Regards,

-Jack de Valpine

Mark Stock wrote:

···

[discussion moved over from Radiance-general

Summary:
rpict and rpiece bomb with the "bad ray direction in inithemi"
error, optimization not (?) to blame

Longer story:
I keep encountering trouble rendering my scene with rpiece.
I will start 6 rpiece jobs and after a few hours each one
will have stopped because it encountered the "bad ray direction
in inithemi" error (rt/ambcomp.c). I am running 3.6a on Linux
on a cluster of Opterons, and compiling with gcc. I have tried
versions compiled with -O3, -O2, -O, and no optimization.

An image of the scene can be viewed here:
http://mark.technolope.org/image/p42_temp/img15e1.png
(from 8x8 rpiece, larger image in works is 24x24 pieces)

The scene contains many cones with zero radius at one end,
2000 of them, to be precise. I looked at the code in
rt/o_cone.c, but couldn't quite figure out how the RAY->ron
value is set (intersection surface normal).

The error stems from a test in rt/ambcomp.c that checks the
3 components of the ray's intersection surface normal for any
components outside of the bounds [-0.6:0.6]. I'd like, if I
may, to ask what this check accomplishes. I removed the
check and re-ran it, but I get other errors.

Is there something inherently bad with cones with zero-radius
ends? Or is the 64-bit gcc simply unreliable (and I have
to render this piece on the slower 2-proc Athlon)?

Mark

On Thu, 27 May 2004, Greg Ward wrote:

Hi Mark,

Yeah, I had similar problems with the gcc optimizer when I tried it.
If you used makeall to set the options rather than doing it by hand on
the command line, you should be able to find your settings in the
"rmake" script that gets put in your Radiance executables directory.
From my experiments under OS X at least, I couldn't approach your
benchmark speeds without simultaneously introducing some nefarious bug
in the renderer. I spent some time on it, but gave up trying to figure
out where the calculations had gone south. From what I've seen with
gcc problems in the past, they can be incredibly subtle. (At one
point, the optimizer was giving incorrect arithmetic results inside a
simple loop -- 2+2 = 5 and that sort of nonsense.)

I can't explain why rpict would be able to recover using the -ro option
where rpiece cannot. When code doesn't compile right, it's anybody's
guess as to what's going on.

I also don't know how to search the archives for phrases -- maybe Peter
A-B or someone familiar with our system could help with that one?

-Greg

From: Mark Stock <[email protected]>
Date: May 27, 2004 9:57:16 AM PDT

Thanks for your quick reply, Greg. I have more information
that may help our understanding.

The other options that the rendering used are

-ps 1 -ab 2 -aa 0 -ad 8 -as 0 -dj 0.7 -st 0.05

so I'm guessing it's not the ambient file. It is possible that
with all of the compiler optimizations that I used, I cut one
too many corners.

But how does this explain the fact that I could always
"rpict -ro" and it would continue where it left off, creating
a fine image in the end?

Is there some way to find out what compile-time options
I used? Is there a file in ray/src that contains that
information?

Also, I seem to be unable to search the radiance-online
archives for a whole phrase. How is this done?

Mark
     

_______________________________________________
Radiance-general mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-general

_______________________________________________
Radiance-dev mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-dev

--
# John E. de Valpine
# president
#
# visarc incorporated
# http://www.visarc.com
#
# channeling technology for superior design and construction

I'm using the "-aa 0" option, so I don't store an ambient file.

Carsten gave me the idea of looking in the cone intersection
code, as the error stems from the normal, somehow. I realized
that the cones in my scene are very narrow (0.01 long, but
radii are 0.0001 and 0.0). I know very little about raytracing
calculations, but I'd guess that a ray-cone intersection test
possibly involves a division of a radius (or radius difference)
and a length. This may puch the calculation towards machine
precision...enough so that a normal may not be calculated
accurately.

Here are two cones (one at a tip, and one just before
transitioning to cylinders):

def cone 0c 0 0 8
-0.525577 -0.177557 -0.381599 -0.526188 -0.185438 -0.38283 0
0.0001391

def cone 18c 0 0 8
-0.505553 -0.317633 -0.401908 -0.502736 -0.324996 -0.403267
0.0039803 0.004

Mark

···

On Fri, 25 Jun 2004 [email protected] wrote:

Hi Mark,

Are you using a shared ambient file? If so,
I suspect a bad NFS lock manager is the culprit.
I can't think what else might be causing this error.

-Greg

Quoting Mark Stock <[email protected]>:

> [discussion moved over from Radiance-general
>
> Summary:
> rpict and rpiece bomb with the "bad ray direction in inithemi"
> error, optimization not (?) to blame
>
> Longer story:
> I keep encountering trouble rendering my scene with rpiece.
> I will start 6 rpiece jobs and after a few hours each one
> will have stopped because it encountered the "bad ray direction
> in inithemi" error (rt/ambcomp.c). I am running 3.6a on Linux
> on a cluster of Opterons, and compiling with gcc. I have tried
> versions compiled with -O3, -O2, -O, and no optimization.
>
> An image of the scene can be viewed here:
> http://mark.technolope.org/image/p42_temp/img15e1.png
> (from 8x8 rpiece, larger image in works is 24x24 pieces)
>
> The scene contains many cones with zero radius at one end,
> 2000 of them, to be precise. I looked at the code in
> rt/o_cone.c, but couldn't quite figure out how the RAY->ron
> value is set (intersection surface normal).
>
> The error stems from a test in rt/ambcomp.c that checks the
> 3 components of the ray's intersection surface normal for any
> components outside of the bounds [-0.6:0.6]. I'd like, if I
> may, to ask what this check accomplishes. I removed the
> check and re-ran it, but I get other errors.
>
> Is there something inherently bad with cones with zero-radius
> ends? Or is the 64-bit gcc simply unreliable (and I have
> to render this piece on the slower 2-proc Athlon)?
>
> Mark
>

_______________________________________________
Radiance-dev mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-dev

Oh, and what I forgot to say is that I replaced the "error"
line with a "normalize(hp->uz);" line and the code compiles
(gcc -O) and runs (rpiece) on the 64-bit Opterons. There
seem to be no image errors.

pfilt'd result (in the middle of rpiece -R) is as follows:
http://mark.technolope.org/image/p42_temp/img17t3.png

Mark

···

On Fri, 25 Jun 2004 [email protected] wrote:

Hi Mark,

Are you using a shared ambient file? If so,
I suspect a bad NFS lock manager is the culprit.
I can't think what else might be causing this error.

-Greg

Quoting Mark Stock <[email protected]>:

> [discussion moved over from Radiance-general
>
> Summary:
> rpict and rpiece bomb with the "bad ray direction in inithemi"
> error, optimization not (?) to blame
>
> Longer story:
> I keep encountering trouble rendering my scene with rpiece.
> I will start 6 rpiece jobs and after a few hours each one
> will have stopped because it encountered the "bad ray direction
> in inithemi" error (rt/ambcomp.c). I am running 3.6a on Linux
> on a cluster of Opterons, and compiling with gcc. I have tried
> versions compiled with -O3, -O2, -O, and no optimization.
>
> An image of the scene can be viewed here:
> http://mark.technolope.org/image/p42_temp/img15e1.png
> (from 8x8 rpiece, larger image in works is 24x24 pieces)
>
> The scene contains many cones with zero radius at one end,
> 2000 of them, to be precise. I looked at the code in
> rt/o_cone.c, but couldn't quite figure out how the RAY->ron
> value is set (intersection surface normal).
>
> The error stems from a test in rt/ambcomp.c that checks the
> 3 components of the ray's intersection surface normal for any
> components outside of the bounds [-0.6:0.6]. I'd like, if I
> may, to ask what this check accomplishes. I removed the
> check and re-ran it, but I get other errors.
>
> Is there something inherently bad with cones with zero-radius
> ends? Or is the 64-bit gcc simply unreliable (and I have
> to render this piece on the slower 2-proc Athlon)?
>
> Mark
>

_______________________________________________
Radiance-dev mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-dev

Thanks for the additional information -- I'll check into this when I get back to
the States on Thursday.

-Greg

Quoting Mark Stock <[email protected]>:

···

Oh, and what I forgot to say is that I replaced the "error"
line with a "normalize(hp->uz);" line and the code compiles
(gcc -O) and runs (rpiece) on the 64-bit Opterons. There
seem to be no image errors.

pfilt'd result (in the middle of rpiece -R) is as follows:
http://mark.technolope.org/image/p42_temp/img17t3.png

Mark

Hi Mark,

Carsten was right. There was an accumulation of numerical error in the cone normal computation for needle-like cones like yours. The errors I saw were as high as 10^-3, which is significant and can accumulate especially on multiple ray intersections. I have added a correction that should take care of this, which I have attached as a context diff, below.

-Greg

Index: o_cone.c

···

===================================================================
RCS file: /cvs/radiance//ray/src/rt/o_cone.c,v
retrieving revision 2.5
retrieving revision 2.6
diff -r2.5 -r2.6
2c2
< static const char RCSid[] = "$Id: o_cone.c,v 2.5 2004/03/30 16:13:01 schorsch Exp $";
---
> static const char RCSid[] = "$Id: o_cone.c,v 2.6 2004/06/28 10:07:17 greg Exp $";
129a130,134
> a = DOT(r->ron, r->ron);
> if (a > 1.+FTINY || a < 1.-FTINY) {
> c = 1./(.5 + .5*a); /* avoid numerical error */
> r->ron[0] *= c; r->ron[1] *= c; r->ron[2] *= c;
> }
[photon:ray/src/rt] gward% ^diff^diff -c
cvs diff -c -r 2.5 o_cone.c
Enter passphrase for key '/Users/gward/.ssh/id_rsa':
Index: o_cone.c

RCS file: /cvs/radiance//ray/src/rt/o_cone.c,v
retrieving revision 2.5
retrieving revision 2.6
diff -c -r2.5 -r2.6
*** a/o_cone.c 30 Mar 2004 16:13:01 -0000 2.5
--- b/o_cone.c 28 Jun 2004 10:07:17 -0000 2.6
***************
*** 1,5 ****
   #ifndef lint
! static const char RCSid[] = "$Id: o_cone.c,v 2.5 2004/03/30 16:13:01 schorsch Exp $";
   #endif
   /*
    * o_cone.c - routine to determine ray intersection with cones.
--- 1,5 ----
   #ifndef lint
! static const char RCSid[] = "$Id: o_cone.c,v 2.6 2004/06/28 10:07:17 greg Exp $";
   #endif
   /*
    * o_cone.c - routine to determine ray intersection with cones.
***************
*** 127,132 ****
--- 127,137 ----
                         for (i = 0; i < 3; i++)
                                 r->ron[i] = (co->al*r->ron[i] - c*co->ad[i])
                                                 /co->sl;
+ a = DOT(r->ron, r->ron);
+ if (a > 1.+FTINY || a < 1.-FTINY) {
+ c = 1./(.5 + .5*a); /* avoid numerical error */
+ r->ron[0] *= c; r->ron[1] *= c; r->ron[2] *= c;
+ }
                 r->rod = -DOT(r->rdir, r->ron);
                 r->pert[0] = r->pert[1] = r->pert[2] = 0.0;
                 r->uv[0] = r->uv[1] = 0.0;