Linux compilation errors - radiance as a library

Stephen_Wasilewski · March 30, 2022, 7:53am

I’ve encountered two errors compiling radiance as a library that occur on Linux but not MacOS. I’ve tried two different linux machines:
Debian GNU/Linux 11 (bullseye) // gcc version 10.2.1 20210110 (Debian 10.2.1-6)
Ubuntu 20.04.3 LTS // gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.1)

my mac is:
10.14.6 // Apple LLVM version 10.0.1 (clang-1001.0.37.9)

Here are the issues:

when using rtrace as a library, I typically first load the scene without a light source and then add it later (this ensures it is at the end of the object record so I can swap sources). Now, this shouldn’t cause any problems, and typically doesn’t, it is perfectly valid to load a scene in radiance without a source, but in the specific case I compile as a module, under linux, I get a segfault because in marksources() if no sources are found then srccnt and cntord are never allocated. My fix is to simply not return early (commenting near lines 118 in source.c):

	/* still need  to initialize contribution arrays, because we will load a source in the future - SW*/
//	if (!foundsource) {
//		error(WARNING, "no light sources found");
//		return;
//	}

Roland also suggested that if these variables were initialized with a NULL pointer that might also solve the issue, but I have not tried that. Somehow clang is catching this and doing this work for me.

My other issue is with compiling rcontrib as a library. Similarly, I can make a running executable from the exact same code even under Linux that does not segfault, but if I compile into a module I get a segfault when calling setcontext(). This one is weird, because in this case (compiled as a module) for some reason it is not even calling setcontext() defined in caldefn.c, but instead is using something declared in ucontext.h (which is deprecated on mac). As best I can tell, this file is included in signal.h which is included in multiple places, but probably most relevant is rcmain.c (and likewise in my modified rcontrib initializing source, rcinit.c). I tried linking the libraries in all different orders, but couldn’t figure this one out, other than by patching caldefn.c and func.c to change the name of setcontext() to avoid this issue.

caldefn.h:

extern char	*setcontextcal(char *ctx);

I’m using cmake to build, but here are the actual calls to gcc for the linking step (compilation step is identical):

# rcontrib: this is the classic radiance executable

/usr/bin/cc CMakeFiles/rcontrib.dir/Radiance/src/rt/rcmain.c.o CMakeFiles/rcontrib.dir/Radiance/src/rt/rcontrib.c.o CMakeFiles/rcontrib.dir/Radiance/src/rt/rc2.c.o CMakeFiles/rcontrib.dir/Radiance/src/rt/rc3.c.o -o rcontrib  libradiance_o.a librtrad_o.a -lm 

# raytcontrib: this is similiar to rcontrib, but uses the same code to initialize the scene and call rays as the library (this works without the patch!)

/usr/bin/c++ CMakeFiles/raytcontrib.dir/mainrc.cpp.o CMakeFiles/raytcontrib.dir/render.cpp.o CMakeFiles/raytcontrib.dir/rcontrib.cpp.o -o raytcontrib  librcontribcfiles.a librcraycalls.a libradiance.a librtrad.a /usr/lib/x86_64-linux-gnu/libpython3.8.so -lm 


# rcontrib_c: here is the linking for the library, it is likely the 
# -shared flag (which is needed) that lets the ucontext.h 
# symbol leak in somehow, but I do not get duplicate #
# symbol linker errors, it compiles fine, but then segfaults. 
# If I get rid of -fPIC it throws linker errors.

/usr/bin/c++ -fPIC -flto -shared  -o /home/wasilewski/raytraverse/raytraverse/crenderer/rcontrib_c.cpython-38-x86_64-linux-gnu.so CMakeFiles/rcontrib_c.dir/render.cpp.o CMakeFiles/rcontrib_c.dir/rcontrib.cpp.o  librcontribcfiles.a librcraycalls.a libradiance.a librtrad.a -lm

If anyone has any compiler advice I’ll take it, but this seems like a problem most easiily addressed by changing the code.

Thanks!
stephen

Greg_Ward · March 30, 2022, 4:21pm

Hi Stephen,

I renamed setcontext() to calcontext(), hoping this will avoid your obscure name conflict. (The original function name was a bit generic, anyway.)

However, I don’t quite understand your issue with marksources(). Most realloc() calls will accept a NULL pointer for the old memory, and revert to malloc() semantics in that case. Since static variables are normally initialized to NULL, this is what should happen in the code. Adding explicit initializers might be one thing to try, as Roland suggested, i.e.:

/* source.c, lines 39-40 */
static CONTRIB  *srccnt=NULL;		/* source contributions in direct() */
static CNTPTR  *cntord=NULL;			/* source ordering in direct() */

If that fixes your problem, then let’s just figure modules don’t initialize static variables to zeroes as they ought to in C (part of the language definition in fact). In that case, we should add zero initializers all over the code to avoid this in future. Not a fun task, but doable.

My bigger concern is that you should call marksources() after you have loaded all the light sources you care about. Any sources loaded after this call will not participate in the direct lighting calculation. I think you might only encounter problems when there are some glow sources that get added to the SRCREC source[] array, so nsources is positive, otherwise all calls to direct() would return before any of the contribution arrays, etc. were accessed.

Thinking on this, we might change this code in source.c lines 413-414 from:

	if (nsources <= 0)
		return;		/* no sources?! */

to:

	if (maxcntr <= 0)
		return;		/* no contributions?! */

This would also avoid calling realloc() with a NULL pointer. Also worth a try.

I would, however, like you to reconsider when you make the call to marksources(). In the callable library version of Radiance, the ray_init() function in raycalls.c makes this call. Any sources added after ray_init() won’t be included. It might be better for you to replace ray_init() with your own method of loading and initializing that calls marksources() after you’ve added yours.

Cheers,
-Greg

Stephen_Wasilewski · March 31, 2022, 11:01am

Hi Greg,

Thanks for renaming setcontext, that was the easy fix…

I tried with the NULL pointers, no luck. It turns out I misdiagnosed the problem, the segfault is thrown by:

      error(WARNING, "no light sources found");

and just like setcontext, the module code does not call the error() defined in common/error.c but some other definition. error() is defined in the GNU c-library and is declared in error.h. the -shared flag (which is required to build the module) must be including some of the system libraries. I don’t know why this doesn’t throw a duplicate symbol error and I can’t figure out how to get around it. I totally get if you do not want to rename error (rterror perhaps?), and for now it is not a big deal for me to compile with my own version of source.c that does not give a warning.

Thanks for your help!

Read on for an explanation of why I call loadsrc more than once (and sometimes with nothing there)

Regarding your bigger concern. I do have my own initializing routines. First I parse all the options and intitotypes / setoutput / etc… then I load the scene:

void
rtrace_loadscene(char* pyoctnm) {
  /* get octree */
  char octnm[strlen(pyoctnm) + 1];
  strcpy(octnm, pyoctnm);
  extern char  *octname;
  readoct(octname = octnm, loadflags & ~IO_INFO, &thescene, NULL);
  octname = NULL;
  nsceneobjs = nobjects;
}

after this I load the sources and finish setting up the environment:

int
rtrace_loadsrc(char* srcname, int freesrc) {
  int oldcnt = nobjects;
  ambnotify(OVOID);
  freesources();
  if (freesrc > 0) {
    freeobjects(nobjects - freesrc, freesrc);
  }
  if (srcname != NULL) {
    readobj(srcname);
    nsceneobjs = nobjects;
  }
  /* don't warn about sources as this is expected behavior */
  int warnings = erract[WARNING].pf != NULL;
  erract[WARNING].pf = NULL;
  if (!castonly) {	/* any actual ray traversal to do? */
    ray_init_pmap();	/* PMAP: set up & load photon maps */
    marksources();		/* find and mark sources */
  } else
    distantsources();	/* else mark only distant sources */
  if (warnings) {
    erract[WARNING].pf = wputsrt;
  }
  return nobjects - oldcnt;
}

rtrace_loadsrc designed to allow repeated calls. Each time its called it frees the sources and then re-initializes them. Typicallly, i load these separately from the the rest of the scene so that I can remove them (the call to freeobjects()) by ensuring they are at the end of the object table, but I also want the code to work in the case that a source is part of the base scene. Since the call is not aware of whether the scene has a source inside it, and whether a call to rtrace_loadsrc will be made prior to running, when I load the scene I do an empty call to rtrace_loadsrc:

void Rtrace::loadscene(char *octname) {
  Renderer::loadscene(octname);
  ray::ray_done(0);
  nproc = ray::rtinit(argc, argv);
  rvc = ray::return_value_count;
  ray::rtrace_loadscene(octree);
  ray::rtrace_loadsrc(nullptr, 0);
}

Greg_Ward · March 31, 2022, 2:56pm

Wow – namespace invasions seem like they’re pernicious in the Linux shared modules. You are correct that I’d rather not change the name of the error() function everywhere in my code. It is generic, but that’s what I want with a generic call. System library developers working on new user code should steer clear of such generic names at this point in the game, IMHO.

The easiest fix, since all routines that call error() have #include “rterror.h” is to add a define to that header something like this after the system #include’s:

#define error rterror   /* avoid Linux library name collision */

The only issue then is whether “error()” is declared in any headers that are included after this one in the various Radiance modules. If you go to the ray/src directory and run:

find * -name "*.[ch]" -exec grep -s '"rterror\.h"' {} \; -print -exec grep -n '^#include' {} \;

That will print out the #include lines for all C modules that include “rterror.h”, so you’ll know which ones are susceptible to having later-included system headers that could have the “error” symbol. The safest thing of course would be to move the “rterror.h” include to the end of the list in all cases, but that may be overkill.

Regarding your freeing of objects, I guess you figured out that I only actually free memory and make it available again when you free up to the last allocated object. Also, free’ed object records in the middle don’t get recycled, although this could be done with a little more bookkeeping between freeobjects() and newobject() in the common/readobj.c module. If that’s a priority for you, I could look into it, but it seems you’re OK with your current method. The disused OBJREC’s don’t take up that much space, anyway – most of the data is usually in the malloc’ed arguments, which are freed in freeobjects().

Stephen_Wasilewski · April 1, 2022, 6:00am

Hi Greg,

It seems like your easy fix worked, my code now calls the correct error() function:

#ifndef _RAD_RTERROR_H_
#define _RAD_RTERROR_H_

#include <errno.h>
#define error rterror   /* avoid Linux library name collision */

although the renaming is a bit confusing, at first I didn’t understand what this was intended to do.
maybe something like this would be clearer:

#define error linuxlib_error   /* avoid Linux library name collision */

thanks for your help!

-Stephen

Greg_Ward · April 1, 2022, 2:35pm

Glad it worked! By putting the #define in rterror.h, you effectively change the name of error() throughout Radiance to whatever you define it to. That’s why I chose “rterror”. The Linux system call is still called “error”, but none of the Radiance code that includes rterror.h will use that call.

Make sense?
-G

Stephen_Wasilewski · April 1, 2022, 3:00pm

Huh, I think I am confused.

so the radiance code says error, but the compiler/linker use rterror?

because after this change, when I call error, it calls the one defined by radiance, I thought I was effectively kicking the linux definition out of the namespace. Does this mean throughout one could use error/rterror interchangeably? Theoretically, how would one call the linux error?

Are you going to change rterror.h, or can I move the define into my code and have it still catch all the potential calls?

Greg_Ward · April 1, 2022, 5:17pm

Well, the #define is just a macro replacement of all instances of “error” with “rterror” in whatever code follows. Since the Radiance code includes rterror.h, that means the macro will replace all calls to error() in the code with rterror() before compilation. The external symbol in the linker will be “rterror” rather than “error”. If there are places where rterror() is called by name, the actual called function would still be the one defined in error.c, which also had its symbol replaced by the same macro.

I wasn’t planning to introduce this change in rterror.h, since it seems to only be a problem for you at the moment, but I might at some point in the future. For now, best add it to your own code.

If you really needed to call the Linux error() from a module that included rterror.h, you would have to use:

#undef error  /* clear this macro so we can access original symbol */
    error(...);  /* Calls Linux error() function */
#define error rterror    /* back to rterror override */

Not very pretty, but I don’t know why you would mix code like that in the first place.

-Greg

Stephen_Wasilewski · April 2, 2022, 7:56am

thanks Greg, that makes sense now. And I don’t actually need to call linux error (or radiance error for that matter, I haven’t actually tried to pass the codes and messages back to python anyways), but that helps me understand how the macro works.

-stephen