Error: rtcontrib: fatal - incomplete ray value from rtrace

Hello Greg,
I am suspecting the machine little bit, because it stop after different
time, sometime it got calculation for 4points sometime for 10, etc.

In the beginning I had only one script for whole grid and run it on 8cores
on one node which was the maximum. But there was probably not enough memory
and it gave me error message that there is not enough allocated memory and
it went to waiting process but it never restarted. It stopped even though I
could still see that the the job which I submited to run the script was
active. Unfortunately I don't remember the whole error message correctly.
Then I have split the calculation up to more scripts with less cores used to
save some time and to avoid previous problem. I have run several scripts
without a problem. Then I have got this message and I have deleted the job
and resubmitted it again without doing any changes. Some of the jobs run
successfully but some just gave this error message and stopped. Again I
could see that the job was still active but nothing was coming out from
calculations.
As I am writing this I am thinking that the server is getting overloaded and
pause the calculation and then rtrace can not finish the process.
I don't know if this make any sense, but I can try to submit it again and
hope that it will finish.

Could you please say more about signal 9 and if I can do anything with it.

Thank you
David

···

On 19 May 2011 16:17, Greg Ward <[email protected]> wrote:

Hi David,

I've never encountered this error before. It sounds like your rtrace
process might be dying, but it would normally report an error message if
that were the case. The only case when rtrace doesn't report an error is
when the system kills it with signal 9. Does the calculation proceed for
some time before the error occurs, or does it bail right away?

You say you ran the command successfully before. What, exactly, did you do
differently when you got this error? Are you running it on a different
machine, which might have data or file size limits in place?

-Greg

> From: David Appelfeld <[email protected]>
> Date: May 19, 2011 3:27:04 PM PDT
>
> Dear Radiance users,
>
> I am running calculation of grid of illuminance sensors and recently I
have got several times this error message:
>
> rtcontrib: fatal - incomplete ray value from rtrace
>
> Does anyone have had this before or do you have any suggestion where I
should look for mistakes, I guess there could be several reason for this
error, but I don't know where to start. The rtcontrib command is as
following. I have run this several times because I split the grid and run
more simulations, and it was without problems even running several parallel
simulations in same time using same octree. I am submitting calculations on
our cluster which operating on Sun machines and using GridEngine.
>
> rtcontrib @parameters/vmx_illum.opt -f klems_int.cal -bn Nkbins -fo -o
results/vmx_test_grid_illum/%s.vmx \
> -b kbinW @parameters/west.modifier -b kbinS @parameters/south.modifier
-I+ oct/model_vmx.oct < sensors/grid/vmx_grid_test_illum.pts
>
> Thank you for any suggestions
> David

_______________________________________________
Radiance-general mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-general

Hi Greg again,

I have just reinstalled Radiance completely last week, because when I
generated rendering with three-phase method I have got light coming somehow
from wrong side and Andy suggested to reinstalled it.

I guess you may be right with the system limits. Because I have exceeded
number of opened files before and there could be maybe some other limit
which I am exceeding. Unfortunately I can not change the limits, I have
contacted the servers support but they cannot change it.

The parameters are quite high. I may reduce some but then I have to run
whole simulation again, also for the sensors which I have got results for.

-ab 5
-ad 2048
-as 0
-aa 0
-lw 1.00E-12
-ds 0.1
-dj 0.9
-dt 0
-dc 0.75
-n 4

I have started to run the simulation again and I will see how it goes. My
further step will be probably reduce the parameters.

Thank you for your advices.
David

···

On 19 May 2011 17:15, Greg Ward <[email protected]> wrote:

Hi David,

Andrew McNeil recently found a problem with rtcontrib running in 64-bit
mode on a 32-bit operating system, where very complex ray trees would cause
the process to hang. It sounds a bit like what you're experiencing, so if
you like you can download the latest HEAD release from
www.radiance-online.org, and either recompile everything or (simpler) grab
src/util/rtcontrib.c from the unpacked HEAD and substitute that for the copy
in your existing source tree, recompiling just rtcontrib. The code for
rtrace hasn't changed in any important way, and this should fix it if your
problem is the same.

Regarding signal 9, some Unix implementations use this uncatchable signal
to terminate processes that have exceeded their resource limits. Other
systems send a specialized signal saying what went wrong, and rtrace would
catch this and report the problem (e.g., "file size limit exceeded"). If
your system is just killing the process with signal 9, there's no way to
really know what's going wrong. All you can do is check that your resource
limits are ample to your task.

By the way, Andy only ran into this error when he was using many bounces
and some rather high parameters in rtcontrib, which caused the ray trees to
occasionally exceed 2 GBytes in size -- from a single ray! It's difficult
to say if this is happening in your case without seeing your parameter
settings, but it's worth trying to patch rtcontrib in any case as a first
step.

Best,
-Greg

> From: David Appelfeld <[email protected]>
> Date: May 19, 2011 4:41:33 PM PDT
>
> Hello Greg,
> I am suspecting the machine little bit, because it stop after different
time, sometime it got calculation for 4points sometime for 10, etc.
>
> In the beginning I had only one script for whole grid and run it on
8cores on one node which was the maximum. But there was probably not enough
memory and it gave me error message that there is not enough allocated
memory and it went to waiting process but it never restarted. It stopped
even though I could still see that the the job which I submited to run the
script was active. Unfortunately I don't remember the whole error message
correctly.
> Then I have split the calculation up to more scripts with less cores used
to save some time and to avoid previous problem. I have run several scripts
without a problem. Then I have got this message and I have deleted the job
and resubmitted it again without doing any changes. Some of the jobs run
successfully but some just gave this error message and stopped. Again I
could see that the job was still active but nothing was coming out from
calculations.
> As I am writing this I am thinking that the server is getting overloaded
and pause the calculation and then rtrace can not finish the process.
> I don't know if this make any sense, but I can try to submit it again and
hope that it will finish.
>
> Could you please say more about signal 9 and if I can do anything with
it.
>
> Thank you
> David

_______________________________________________
Radiance-general mailing list
[email protected]
http://www.radiance-online.org/mailman/listinfo/radiance-general