Error: rtcontrib: fatal - incomplete ray value from rtrace

Hi David,

I've never encountered this error before. It sounds like your rtrace process might be dying, but it would normally report an error message if that were the case. The only case when rtrace doesn't report an error is when the system kills it with signal 9. Does the calculation proceed for some time before the error occurs, or does it bail right away?

You say you ran the command successfully before. What, exactly, did you do differently when you got this error? Are you running it on a different machine, which might have data or file size limits in place?

-Greg

···

From: David Appelfeld <[email protected]>
Date: May 19, 2011 3:27:04 PM PDT

Dear Radiance users,

I am running calculation of grid of illuminance sensors and recently I have got several times this error message:

rtcontrib: fatal - incomplete ray value from rtrace

Does anyone have had this before or do you have any suggestion where I should look for mistakes, I guess there could be several reason for this error, but I don't know where to start. The rtcontrib command is as following. I have run this several times because I split the grid and run more simulations, and it was without problems even running several parallel simulations in same time using same octree. I am submitting calculations on our cluster which operating on Sun machines and using GridEngine.

rtcontrib @parameters/vmx_illum.opt -f klems_int.cal -bn Nkbins -fo -o results/vmx_test_grid_illum/%s.vmx \
-b kbinW @parameters/west.modifier -b kbinS @parameters/south.modifier -I+ oct/model_vmx.oct < sensors/grid/vmx_grid_test_illum.pts

Thank you for any suggestions
David

Hi David,

Andrew McNeil recently found a problem with rtcontrib running in 64-bit mode on a 32-bit operating system, where very complex ray trees would cause the process to hang. It sounds a bit like what you're experiencing, so if you like you can download the latest HEAD release from www.radiance-online.org, and either recompile everything or (simpler) grab src/util/rtcontrib.c from the unpacked HEAD and substitute that for the copy in your existing source tree, recompiling just rtcontrib. The code for rtrace hasn't changed in any important way, and this should fix it if your problem is the same.

Regarding signal 9, some Unix implementations use this uncatchable signal to terminate processes that have exceeded their resource limits. Other systems send a specialized signal saying what went wrong, and rtrace would catch this and report the problem (e.g., "file size limit exceeded"). If your system is just killing the process with signal 9, there's no way to really know what's going wrong. All you can do is check that your resource limits are ample to your task.

By the way, Andy only ran into this error when he was using many bounces and some rather high parameters in rtcontrib, which caused the ray trees to occasionally exceed 2 GBytes in size -- from a single ray! It's difficult to say if this is happening in your case without seeing your parameter settings, but it's worth trying to patch rtcontrib in any case as a first step.

Best,
-Greg

···

From: David Appelfeld <[email protected]>
Date: May 19, 2011 4:41:33 PM PDT

Hello Greg,
I am suspecting the machine little bit, because it stop after different time, sometime it got calculation for 4points sometime for 10, etc.

In the beginning I had only one script for whole grid and run it on 8cores on one node which was the maximum. But there was probably not enough memory and it gave me error message that there is not enough allocated memory and it went to waiting process but it never restarted. It stopped even though I could still see that the the job which I submited to run the script was active. Unfortunately I don't remember the whole error message correctly.
Then I have split the calculation up to more scripts with less cores used to save some time and to avoid previous problem. I have run several scripts without a problem. Then I have got this message and I have deleted the job and resubmitted it again without doing any changes. Some of the jobs run successfully but some just gave this error message and stopped. Again I could see that the job was still active but nothing was coming out from calculations.
As I am writing this I am thinking that the server is getting overloaded and pause the calculation and then rtrace can not finish the process.
I don't know if this make any sense, but I can try to submit it again and hope that it will finish.

Could you please say more about signal 9 and if I can do anything with it.

Thank you
David

Yeah, your parameters are set really high. Setting -lw 1e-12 means that in the limit, you will trace a trillion rays for every input ray, and these trillion rays will have to be summed up by rtcontrib. You're probably running out of RAM in your rtcontrib process. I wouldn't bother with the -n 4 -- it won't speed things up and it makes your memory problem worse. Try using -n 1 and -lw 1e-9 or greater.

-Greg

···

From: David Appelfeld <[email protected]>
Date: May 19, 2011 5:44:25 PM PDT

Hi Greg again,

I have just reinstalled Radiance completely last week, because when I generated rendering with three-phase method I have got light coming somehow from wrong side and Andy suggested to reinstalled it.

I guess you may be right with the system limits. Because I have exceeded number of opened files before and there could be maybe some other limit which I am exceeding. Unfortunately I can not change the limits, I have contacted the servers support but they cannot change it.

The parameters are quite high. I may reduce some but then I have to run whole simulation again, also for the sensors which I have got results for.

-ab 5
-ad 2048
-as 0
-aa 0
-lw 1.00E-12
-ds 0.1
-dj 0.9
-dt 0
-dc 0.75
-n 4

I have started to run the simulation again and I will see how it goes. My further step will be probably reduce the parameters.

Thank you for your advices.
David

On 19 May 2011 17:15, Greg Ward <[email protected]> wrote:
Hi David,

Andrew McNeil recently found a problem with rtcontrib running in 64-bit mode on a 32-bit operating system, where very complex ray trees would cause the process to hang. It sounds a bit like what you're experiencing, so if you like you can download the latest HEAD release from www.radiance-online.org, and either recompile everything or (simpler) grab src/util/rtcontrib.c from the unpacked HEAD and substitute that for the copy in your existing source tree, recompiling just rtcontrib. The code for rtrace hasn't changed in any important way, and this should fix it if your problem is the same.

Regarding signal 9, some Unix implementations use this uncatchable signal to terminate processes that have exceeded their resource limits. Other systems send a specialized signal saying what went wrong, and rtrace would catch this and report the problem (e.g., "file size limit exceeded"). If your system is just killing the process with signal 9, there's no way to really know what's going wrong. All you can do is check that your resource limits are ample to your task.

By the way, Andy only ran into this error when he was using many bounces and some rather high parameters in rtcontrib, which caused the ray trees to occasionally exceed 2 GBytes in size -- from a single ray! It's difficult to say if this is happening in your case without seeing your parameter settings, but it's worth trying to patch rtcontrib in any case as a first step.

Best,
-Greg