multi-processing on SGI Onyx

Here is an error message that appears when I am running Radiance on
processors shared with other unix processes on our 16 processor SGI Onyx:

"rpict inconsistency
address not found in avlemi
rpiece error reading from rpict"

When this message appears one of the squares in the images is not rendered. If
there are more such messages, more than one square is not rendered.
Presumably, the processor responsible was not able to contribute its share of
the rendering because of a conflict with another process sharing the same
processor. To avoid this problem I usually run the "top" command to see how
many processors are currently running unix processes and then I choose a
number of processors for my rpiece script that will be less than the number of
unused processors. I wonder if someone has a more detailed explanation of the
error and if there are ways to avoid the error when the processors share
processes other than Radiance.

Thomas

Thomas Seebohm wrote:

Here is an error message that appears when I am running Radiance on
processors shared with other unix processes on our 16 processor SGI Onyx:

WOW!!! A predator rack!!! Somebody around here still uses *REAL*
hardware! To hell with dual pentium boards! :^)

"rpict inconsistency
address not found in avlemi
rpiece error reading from rpict"

Oh boy, it's the avlmemi curse once again... this is getting old. :^)

In ray/src/rt/ambient.c you'll find and '#if 1' preceding the avlmemi()
definition. Change that to an '#if 0' and she should be right, mate.

I'll leave it to Greg to eggs-plain what avlmemi does wrong. Possibly a
portability bug, but not peculiar to IRIX. Also pops up under Linux.
Nuthin' wrong with your Onyx, that's for sure.

Onyces rule!

···

--
END OF LINE. (MCP)

Roland Schregle wrote:

WOW!!! A predator rack!!! Somebody around here still uses *REAL*
hardware! To hell with dual pentium boards! :^)

Yes to hell with them. Real (financially challenged) men run their simulations on Athlon systems. =8-)

I'll leave it to Greg to eggs-plain what avlmemi does wrong. Possibly a
portability bug, but not peculiar to IRIX. Also pops up under Linux.
Nuthin' wrong with your Onyx, that's for sure.

Sometimes it pays to be an email pack-rat. Here's a snippet from an old email exchange between Greg & I:

" ... I got very annoyed about the avlmemi bug after the workshop, thinking about these things that have been dogging me for so many years, so I did a little investigation on the OS X version, where to my horror the same problem had come up, again. Fortunately, I had an easily reproduced error, so I was able to do a little experimentation. All these years, I thought it was a bug in the qsort(3) library routine, but on further testing, I couldn't get it to fail, so I started to look elsewhere. As it turned out, it wasn't the qsort() function itself, rather it was the comparison function I was giving it in ambient.c. I was comparing two pointers using pointer arithmetic, using an expression that works with most compilers, but apparently not GNU-C! In the end, it was a single word that needed to be changed to rid me of this bug that's been hanging on for at least 6 years, since the first Linux systems came into widespread use."

There you go. Maybe I saved Greg some typing.

···

----

      Rob Guglielmetti

e. [email protected]
w. www.rumblestrip.org

Rob Guglielmetti wrote:

Roland Schregle wrote:

> WOW!!! A predator rack!!! Somebody around here still uses *REAL*
> hardware! To hell with dual pentium boards! :^)

Yes to hell with them. Real (financially challenged) men run their
simulations on Athlon systems. =8-)

... and code in binary (www.kaniamania.com/html/1190.html)? :^)

I was comparing two pointers using pointer arithmetic, using an expression
that works with most compilers, but apparently not GNU-C!

Way to go, Greg. Never did trust gcc. Also have some apalling memories
of its early C++ handling. I prefer the native compilers, if they're
available, because it also tests your code for portability. I assume
Tom's Onyx runs IRIX 6.5 which requires a license for the compiler, in
which case gcc becomes an attractive option after all.

There you go. Maybe I saved Greg some typing.

Yep. Also spared Greg's keyboard some unnecessary depreciation. :^)

···

--
END OF LINE. (MCP)

The magic word to change is in src/rt/ambient.c, in the function aposcmp(avp1, avp2), which should be corrected to read:

/* GW NOTE 2002/10/3:
  * I used to compare AMBVAL pointers, but found that this was the
  * cause of a serious consistency error with gcc, since the optimizer
  * uses some dangerous trick in pointer subtraction that
  * assumes pointers differ by exact struct size increments.
  */
static int
aposcmp(avp1, avp2) /* compare ambient value positions */
char **avp1, **avp2;
{
         return(*avp1 - *avp2);
}

···

----------
On another note, I have been following with interest the discussion on parallel rendering solutions and alternatives to a working NFS lock manager. I haven't responded because I haven't had anything intelligent to add... It's sounding like the consensus is headed the direction of a socket-based client/server solution. A dreadful pain to implement from all I've seen, but perhaps it's best in the long run.

-Greg

Greg Ward wrote:

char **avp1, **avp2;

I hope you're using pointers to void in the ANSI version... :wink:

-schorsch

···

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/