multiprocessor systems, Radiance, and you

Hi Schorsch,

just a quick crude reply right now, have to leave soon, maybe more next
week..

OK, I didn't really promote my PVM stuff vigourously. In 2001, I had a
short email-contact with Charles, I offered the stuff for publication.
(Some small bugs had been in it at that time, and the parallel rview was
much too slow, it's the problem of the rview-style itself which makes
parallel rview rather impossible/inefficient, so I abandoend it
finally).

But the rpict mode works fine and reliable, esp after some later
improvement.

On all occasions of my mentioning it later, I had the impression that
the interest in it was rather limited, besides, I concentrated on other
work (direct cache), so I kept my mouth shut on it.. :slight_smile: This isn't
meant as a complaint, really not. It's just a self-evident fact that
there are several 'philosophies' concerning the future Radiance
development, so there will always be more ideas then final (official)
realisations.

Additionally, I know nothing about Windows or OS X, so that platform
independent PVM Version would be something for a team to be built, if at
all. I'm currently just a single bozo (who by the way learned
C-programming by exploring the Radiance-code) so I did experiment a lot
but mainly persued personal preferences (additionally receiving no
funding, too, which I have to consider, such is the hard reality of
life). If the stuff is of public interest, however, I'm of course all
ears and willing to join in some cooperative effort.

So long for now

-Carsten

Hi Georg, Carsten, Peter, Greg and others,

As in my other post, sorry to be weighing in a week late. In any event, what is the current summary of ways to develop distributed rendering for Radiance? Based on my understanding of the topic, it seems that there are several approaches that have been put on the table:

    * file based locking:
          * NFS - as currently implemented and with all the NFS related
            issues
          * Samba - I recall that Georg had mentioned this as a possibly
            more reliable mechanism
          * Custom - homegrown solution for locking files
    * client/server - presumably a socket based client server mechanism
      but not relying on pvm/mpi
    * pvm/mpi:
          * LBNL - is this pvm or mpi and is it available to the open
            source development team?
          * Carsten - has done something with pvm, is there more
            information description?
          * Roland Koholka, Heinz Mayer, Alois Goller ("MPI-parallelized
            Radiance on SGI CoW and SMP" - Parallel Computation, 4th
            International ACPC Conference, Salzburg, Austria, LNCS 1557,
            pages 549- 558, February 1999.) - I do have the modified
            code (ambient.c, rmain.c, and rpict.c) for this that I
            downloaded at somepoint long ago if anyone is interested as
            well as a pdf of the paper.

I guess the very general question that I would have is what is the best solution architecturally for Radiance:

    * Is it easiest to develop a file based locking solution, possibly
      around samba, which is a robust mechanism based on what I have read?
    * What are the implementation issues relating to doing a
      client/server model vs. something like pvm/mpi, how much of the
      guts of radiance need to be worked on for either option, what is
      most robust, extensible and os dependant/independant

If it is worth it to the development team, I would be happy to dig back into the archives and try to put together a fuller summary of the various options that have been put on the table. Let me know.

Regards,

-Jack de Valpine

Georg Mischler wrote:

···

Carsten Bauer wrote:

there's one master distributing the blocks, the workers
which do the tracing, a collector receiving finished scanlines (and in
the end puzzles everything together for the big picture) and of course
the ambient slave, who receives amb. values and broadcasts them to all
the other workers. This ambient slave alone has access to the file for
storing them. Only at the beginning of a new run the workers can access
an already existing ambfile for reading in values.

In a thread on the dev list from Wed, 12 Jun 2002
http://www.radiance-online.org/pipermail/radiance-dev/2002-June/000001.html
I said the following, which was received with quite a bit
of scepticism:

  Since Windows doesn't support NFS file locking
(and neither did cygwin, last time I looked), we'll need to find
a better solution for concurrent access to ambient files. I can
think of two portable ways to do this: Either we invent a file
based locking mechanism, or we establish a seperate server
process that accepts network store and retreival requests by the
actual simulation processes. The latter would be more technicall
involved, but probably a lot more robust. Any thoughts?

And now, half a year later, you tell us that you already have such
a server implemented? Only that you call it "slave"... :wink:
Does your "slave" require PVM? If yes, then that would probably make
it platform independent, otherwise you'd have to tell us some more
details. Personally, I think that this alone would be worth adding
to the ANSified Radiance core, with the management stuff up for
discussion.

But maybe we can now really move the details to the dev list. I'm
cross-posting this, so there's no need to reply on the general list.

-schorsch

--
# John E. de Valpine
# president
#
# visarc incorporated
# http://www.visarc.com
#
# channeling technology for superior design and construction

Hi,
There is already an implementation of the ambient file sharing/locking and rpiece (I think) using Samba. It was developed for cygwin. The modifications were very minor. Shall I try to dig up these mods and post them here?
The LBNL multiprocessor version is built around MPI. There is a version for the Cray T3E and a version for the SGI Onyx 2000. Apparently MPI is not terribly cross-platform. I do not speak for LBNL, but I would expect this version to be Open Source along with the rest of the kettle.
Has anyone played around with Greg's multiprocessor client/server model? He developed "rholo" (as in holodeck) while working at SGI. It has some fancy ray-caching mechanisms for near real-time walkthroughs, and also has a back-end that effectively manages the resources of 64 processors or more. I don't know what procedure mechanism he uses. Knowing him, he probably started from scratch.
-Chas

MPI *is* cross-platform, as far as I know. There's a free version of MPI 1.2 (MPICH) which runs on many Unices, Mac OS X and Windows NT, 2000, and XP. MPI-2 is only available in commercial implementations, as far as I know, though MPICH is planned to support it.

I have real doubts about the performance of SMB (on which Samba is based) and, if Microsoft modifies the underlying protocols, it may no longer be widely available--it isn't readily available on most unix systems, though there are implementations that will run on most of them.

I think this may be an area that best performance and ease of use will be achieved either by supporting multiple standards or, alternatively, by doing some protocol design. For what Radiance does with its ambient data, MPI is may be overkill. Or maybe not; I'm not a parallel systems expert.

Randolph

MPI links:
   MPICH http://www-unix.mcs.anl.gov/mpi/mpich/
   MPICH for XP http://www.lfbs.rwth-aachen.de/mp-mpich/
   MPI forum http://www.mpi-forum.org/
      Standards documents annoying distributed in PostScript.
   There are various commercial implementations as well.

Sorry Randolph. My experience, both direct and indirect, is opposite just about every point you make. I have no experience with MPICH, but when I oversaw the programming of LBNL's "LDRD" version of Radiance, the programmer had to re-write his code for two different SGI platforms. So much for portability. Perhaps this says more about the skill of the programmer (I don't think so), or of the SGI implementation(s) of MPI....?
In the experiments that I ran with the CynWin version of rpiece, SMB/Samba has proven to be very responsive and reliable. And to address one of Greg's concerns, the Samba extensions for Unix are quite easy to install. For the Windows version of Radiance, no additional libraries would need to be installed (of course.)
The LDRD version of Radiance required the use of a robust interface such as MPI in order for the code to scale up to hundreds of processors without serious degradation, according to the programmer. It was running on 250 processors on the Cray T3E with about 80% utilization of the processors.
-Chas
Randolph Fritz <randolph@panix.com> wrote:MPI *is* cross-platform, as far as I know. There's a free version of
MPI 1.2 (MPICH) which runs on many Unices, Mac OS X and Windows NT,
2000, and XP. MPI-2 is only available in commercial implementations,
as far as I know, though MPICH is planned to support it.

I have real doubts about the performance of SMB (on which Samba is
based) and, if Microsoft modifies the underlying protocols, it may no
longer be widely available--it isn't readily available on most unix
systems, though there are implementations that will run on most of them.

I think this may be an area that best performance and ease of use will
be achieved either by supporting multiple standards or, alternatively,
by doing some protocol design. For what Radiance does with its ambient
data, MPI is may be overkill. Or maybe not; I'm not a parallel systems
expert.

Randolph

MPI links:
MPICH http://www-unix.mcs.anl.gov/mpi/mpich/
MPICH for XP http://www.lfbs.rwth-aachen.de/mp-mpich/
MPI forum http://www.mpi-forum.org/
Standards documents annoying distributed in PostScript.
There are various commercial implementations as well.

···

_______________________________________________
Radiance-dev mailing list
Radiance-dev@radiance-online.org
http://www.radiance-online.org/mailman/listinfo/radiance-dev

Looks like Greg was busy already with the ansification...
To make activities like this easier in the future, are there any
plans to make the sources available per CVS somewhere? I'm sure
that Sourceforge would be very happy to host it. Or are we still
waiting for the announced group of "advisors" to be selected?

Speaking of CVS, I remember Greg mentioning difficulties to
convert the old Radiance SCCS archives something more useful.
This might help:
  http://mail.gnu.org/archive/html/info-cvs/2002-03/msg00613.html
  http://www.gigascale.org/softdevel/faq/17.html

Jack de Valpine wrote:

Based on my understanding of the topic, it seems that
there are several approaches that have been put on the table:

Nice and complete list, thanks!

* file based locking:
  * NFS - as currently implemented and with all the NFS related
    issues
  * Samba - I recall that Georg had mentioned this as a possibly
    more reliable mechanism

The "Samba" keyword seemed to involve two seperate suggestions:

      * Transplant some custom locking code used in earlier samba
        versions into Radiance. Actually, this is just a specific
        example of a custom/homegrown solution.

      * Use the standard locking mechanisms on Windows, and have
        a Samba server translate that to unix (NFS-)locks where
        necessary. While this would eliminate the unix/Windows
        compatibility question, it doesn't solve the problems we
        still have between different unixes.
        I was in favour of this solution at one time, but only
        later realized that NFS locking is still not reliable on
        all platforms today.

  * Custom - homegrown solution for locking files

I know that the mailman developers are using something like this,
which actually helps in running this mailing list here. If you
want a good scare, check out all the problems they ran into:
  http://www.google.com/search?q=+site:mail.python.org+mailman+locking

* client/server - presumably a socket based client server mechanism
  but not relying on pvm/mpi

This is by far my preferred solution, despite Greg apparently
being afraid of sockets... :wink: Client/server solutions over
sockets are one of the most universally supported concepts
nowadays, even if some vendors try to hide this fact behind
proprietary terminology.

It also looks like Carsten already has this implemented, although
his description so far hasn't made it completely clear whether
that depends on PVM or not. He wanted to be around again this
week to explain it better. A self contained version is obviously
preferrable, and I believe that it can be done with reasonable
effort.

* pvm/mpi:

I think that we should treat multi-CPU parallelization and
ambient data sharing as two entirely seperate issues, as the
latter will have to work transparently across all versions and
platforms.

* What are the implementation issues relating to doing a
  client/server model vs. something like pvm/mpi, how much of the
  guts of radiance need to be worked on for either option, what is
  most robust, extensible and os dependant/independant

As far as I understand, there's a function that stores ambient
values into the file, and another one that retreives them again.
And that's already all of it that involves the "guts" of radiance.

Selecting a file based implementation for standalone operation,
or a networked one otherwise can be done at process startup, and
will be completely transparent to the actual simulation code.

I don't know whether it would be a good or bad idea to keep the
NFS lock based sharing mechanism in place for those people who
want to trust their lock manager. It's certainly not a technical
problem to do so. Maybe we can postpone this decision until we
actually have a working alternative.

-schorsch

···

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

"è¢K%ŠG†­è0jÆî³&¥­æË­†Ø^j{"~'¶*'N‰š‘朶+â¶'¬–)¶¬y«"z¸§¶ŸºÛ«yªÞ¶«y©ò¦V§²Ú&jG­
ë(º·±«ÚŠV›•ê^¬%R²‰žÂ«y‡š¤%RŠÇ(š)à¢zÚv&§qê'–)Þ¢¸–'§·û®÷«¦·¬ºf›—&¬¡Éµ¶¬y×%‰éí±êïz¹žr§ŠÉ›ºÙ趷¥Ê)à¢zo›ù©‰8b²+eÉö«›*ky÷«­çl¢[­Š‰ÝzÊbµá«z©¥ªÞžÙrm见§ëj'h~ʑël
Xžžßìz»Þ®Ê%ºØ¨žÊ/z»(rG­±ªÞ¢w¨~Ø^š‹-ºx¯z»–\¬ºšh®×r‰Üz›lžŒu¬¬z÷§‰û(™ëޝÚ+²Úò¶ˆbuëaŠÇÚrÖކ)ݦº)®'­j¼­z¹¢ž‰hƒ"ºYrj
Þ{­†lžÆ—«‰éÜz‡ð®+bž
„b
èÂxèmج¶¸›ºØ¨žÌ¬µé¨œsÔ_òâžìJéÎKò!|!‰Èn±ë)•¨§²‡$zÕ=©ÝL#Ü¢yÞrبžÃ롶›—'¯zs"rº,¡ûh}÷«±¬(®H§‚Ê‘ëbš™^™éíjبœÖ­§bjw¢ybê+‚
-j«w«yaZ¦×œÛ4j;¬µ·Ÿ¢·œ†¸¬¶f¬"Û'£
îžx§‚‰ëyÛ§u©í 1ŠÉ,¢›b¢v¥—+©¶Ž1×­“Þµêéiº)¶kz&®f§Ë0¥¦è¦Ú

Peter Apian-Bennewitz wrote:

Yeap. CVS is coming on radiance-online.org.

Not sure why you'd want to replicate all the work that the
folks at sourceforge already offer for free (including a
multiplatform compile farm), but I'm not going to talk you
out of it... :wink:

-schorsch

···

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

Georg Mischler wrote:

Peter Apian-Bennewitz wrote:

> Yeap. CVS is coming on radiance-online.org.

Not sure why you'd want to replicate all the work that the
folks at sourceforge already offer for free (including a
multiplatform compile farm), but I'm not going to talk you
out of it... :wink:

There won't be any ads and email addresses / profiles are not sold to
"non-affiliated third parties". Sourceforge wouldn't be the first "free"
service sliding senselessly ugly over time.

Maybe it's time for a little background info to all: Radiance-online.org
is a machine donated by me and located at "Fraunhofer Institute for
solar Energysystems" (http://www.ise.fraunhofer.de). Since both me and
they are engaged in Radiance, there's a solid interest in keeping the
thing running and extending it.
Furthermore, setting up a mirrored system between this site and LBNL's
radsite server would provide redundant access, although that is not
sorted out yet.

Technically I don't see a compile farm as big advantage: Who's going to
check the binaries (e.g. an ximage AIX version) in a productive
environment ? Checking for compile errors and include files is step 1,
as I'm sure you experienced too.
IMHO, porting to a new platform is most easily and effectively done by
someone with access to and experience of that platform. And if a larger
company is just dying to get a binary for a certain architecture that
none of us has access to, they very likely donate hardware to Greg.

If anyone "hears voices" urging him to participate in
radiance-online.org, e.g. by contributing examples, demo or organizing
features, he/she is warmly welcomed.

cheers
Peter

···

--
pab-opto, Freiburg, Germany, www.pab-opto.de

Hi, Schorsch and others

It also looks like Carsten already has this implemented, although
his description so far hasn't made it completely clear whether
that depends on PVM or not. He wanted to be around again this
week to explain it better. A self contained version is obviously
preferrable, and I believe that it can be done with reasonable
effort.

sorry, I thought everything was clear already. Of course my stuff
depends completely on PVM. Unfortunately, right now I cannot spend much
time on Radinace-realted issues, due to various reasons, so I better not
engage too much in this discussion.

I don't understand the whole business anyway. As already mentioned,
complex renderings of big pictures needing the power of several machines
are no everyday task, so this shouldn't determine the requirements for a
basic version. Moreover, using a complete PP Library just for sharing
ambient values my indeed be judged as overkill, and as the 'gurus' (like
Rob uses to say) already have other things to offer, e.g. Sockets or
Samba, why not try this out.

Additionally, for a moment I was fond of combining Windows and Linux and
everything else, but I've made up my mind on this, who wants to
implement all the upcoming changes of every OS again and again and
again? From Windows 2000 to Windows X to Windows Me to Windows You to
Windows YouNoWho etc, etc. And furthermore, who is paying for that?

So I remembered Gregs words, posted several months ago - cit:

Ewww. Can't we just say that if you want to do parallel rendering, you
need to install Unix?

-Carsten

Carsten Bauer wrote:

Hi, Schorsch and others

> It also looks like Carsten already has this implemented, although
> his description so far hasn't made it completely clear whether
> that depends on PVM or not. He wanted to be around again this
> week to explain it better. A self contained version is obviously
> preferrable, and I believe that it can be done with reasonable
> effort.

sorry, I thought everything was clear already. Of course my stuff
depends completely on PVM.

I was uncertain whether that also included the client/server
communication, thanks for clarifying. Is you code available
anywhere? Even if we're going to try to avoid PVM, your approach
of integration might still be interesting to look at for
inspiration.

I'm splicing the following in here, so that the topic remains in
one thread:

Greg wrote:

On another note, I have been following with interest the discussion on
parallel rendering solutions and alternatives to a working NFS lock
manager. I haven't responded because I haven't had anything
intelligent to add... It's sounding like the consensus is headed the
direction of a socket-based client/server solution. A dreadful pain to
implement from all I've seen, but perhaps it's best in the long run.

I had a look at the ambient code from 3.4. The current file
access mechanics seem to be organized simple enough, so that a
switch to networked access will only require to replace/modify
three functions (and their children, of course). Making lookamb
network-aware would add another two. I'll have a closer look as
soon as the ANSIfied code is available per CVS. You can expect a
few questions after that to verify that I'm understanding
everything correctly.

On the client side, the networking code will be very simple:
- A new option -an host[,port[,timeout]]
- Establish a socket connection on startup.
- Write new ambient values to that socket.
- Update the local ambient data from the socket.
This is almost identical to the current file based code, minus
locking (which is done by the server). We'll have to see if it is
worth the effort to "bundle" network transfers (only sync after a
certain number of new values have been created), which would
involve some more overhead.

On the server side, things may get a bit more involved. Since we
need to make sure that all accesses are sequential, asynchronous
operation through select() looks like the obvious choice.
Note that the server doesn't necessarily have to write the data
to a file, unless the user expects to stop and restart it later.
If it does use a file, then the existing code for that purpose
can be used with only minimal changes if any. NFS lock based
shared access on this file will *still* be possible, on those
platforms where that works.

I'm thinking of implementing the server in Python at first. This
will make it easy to experiment with different strategies wrt.
connection pool management, communication protocol, data caching,
and maybe even file locking mechanisms. It will also benefit from
the experience I have collected by writing the remote simulation
server that is included with Rayfront. Once we know what works
best, it will be possible to either declare the Python version
production ready, or to rewrite it in C. The latter is unlikely
to make it any more efficient, but may be desirable to reduce the
third party tool dependency.

-schorsch

···

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

Would a broadcast solution be better, perhaps? Some small server that
just repeatedly sends out the ambient file? Or something a bit
fancier that doesn't involve TCP connections?

Randolph

Randolph Fritz wrote:

Would a broadcast solution be better, perhaps? Some small server that
just repeatedly sends out the ambient file? Or something a bit
fancier that doesn't involve TCP connections?

In theory, that sounds like a nice idea, but would require the
clients to accept data asynchronously (eg. while they're busy
with other calculations). I'd prefer to keep the tricky parts
concentrated on the server side, so that the clients are only
confronted with new data when they explicitly ask for it.

The other thing is that broadcasts are only possible with UDP,
which doesn't garantee that the data is actually received by
anyone. If we want to make sure that all processes really have
the full set at any time, then there's no way around TCP.

-schorsch

···

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

Randolph Fritz wrote:

Would a broadcast solution be better, perhaps? Some small server that
just repeatedly sends out the ambient file? Or something a bit
fancier that doesn't involve TCP connections?

In theory, that sounds like a nice idea, but would require the
clients to accept data asynchronously (eg. while they're busy
with other calculations). I'd prefer to keep the tricky parts
concentrated on the server side, so that the clients are only
confronted with new data when they explicitly ask for it.

Have a server app which makes broadcasts on a regular schedule--every 100 ms or so and a very small client-side app which receives and stores the broadcasts.

The other thing is that broadcasts are only possible with UDP,
which doesn't garantee that the data is actually received by
anyone. If we want to make sure that all processes really have
the full set at any time, then there's no way around TCP.

You're just not sufficiently devious...when a client broadcasts a new value it marks it "new." It waits to hear the value back from the watching server. If it doesn't hear it in, say, 500 ms, it sends it again.

I'd like to avoid TCP; it is a very heavy protocol, and its use will set a limit on the number of clients--that's why NFS is not primarily TCP-based.

How many clients are we talking about, and how much data?

Randolph, knew that networking background had to be good for something

···

On Thursday, January 30, 2003, at 12:34 PM, Georg Mischler wrote:

Randolph Fritz wrote:

I'd like to avoid TCP; it is a very heavy protocol, and its use will
set a limit on the number of clients-

Are you thinking about the limited number of file descriptors
that a process may have open on some systems? On my box, this is
currently at 1024, and I think it could be reconfigured if
necessary.

How many clients are we talking about, and how much data?

There's no theoretical limit on the number of clients, even if we
may rarely see more than a few dozen (eg. for rendering a walk
through animation). Typical numbers will be 2 to 5. A full set of
ambient data for one simulation can exceed 100 MB, more common
sizes are in the range between 5-30 MB. Not really something you
want to broadcast repeatedly when there are cheaper alternatives.
You'd also unnecessarily flood all the channels of your switch
for hours at a time.

With the TCP approach, the full data will be transferred to each
client exactly once, and all the clients will send the full set
to the server once collectively. This is the same amount of
network traffic that is generated now, when you have the ambient
file sitting on a NFS mounted volume.

-schorsch

···

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

[...first-cut complicated solution clipped...]

If we can assume a shared file system, there's a simpler way.

Have one ambient server app that writes the shared file. The server
receives data via UDP from the various renderers, and adds it to the
file. (Is conflicting data an issue? How best for the server to
resolve it?) Every time the renderers read the file they check to
make sure the items they sent are in it; if they're not, they resend
them.

Error handling & network management:

  - I recommend that renderers keep a timer, and if the server does
    not see ambient file updates in some operator-determined period of
    time, raise an alarm. (The operator-determined time because it
    depends on network and filesystem latency.)

  - I believe the ambient updater itself could be best organized as a
    monitor and an updater process; if the updater fails, the monitor
    can restart it and if several attempts at restarting fail quickly,
    the monitor can raise an alarm. In a large MP environment it
    might be best to have the monitor and updater on separate systems.
    (The monitor process might also start and stop renderers.)

  - The updater can publish its UDP port and a 64-bit random magic
    number via a file, which is protected (or not) via the file
    permissions system. The updater only accepts ambient value
    packets which have a valid magic number. This provides a measure
    of security. It would perhaps be best for the random magic number
    to be updated every few minutes.

I like this proposal, but I am not a Radiance expert. What do you-all
think?

Randolph

···

On Thu, Jan 30, 2003 at 01:34:15PM -0800, Randolph Fritz wrote:

On Thursday, January 30, 2003, at 12:34 PM, Georg Mischler wrote:
>
>In theory, that sounds like a nice idea, but would require the
>clients to accept data asynchronously (eg. while they're busy
>with other calculations). I'd prefer to keep the tricky parts
>concentrated on the server side, so that the clients are only
>confronted with new data when they explicitly ask for it.

Randolph Fritz wrote:

> I'd like to avoid TCP; it is a very heavy protocol, and its use will
> set a limit on the number of clients-

Are you thinking about the limited number of file descriptors
that a process may have open on some systems? On my box, this is
currently at 1024, and I think it could be reconfigured if
necessary.

It's more the network and processor overhead of TCP, though the
limited number of file descriptors was once a problem. I don't think
a single app could ever really manage 1024 simultaneous TCP
connections on most systems, except for the big IBM servers.

There's no theoretical limit on the number of clients, even if we
may rarely see more than a few dozen (eg. for rendering a walk
through animation). Typical numbers will be 2 to 5. A full set of
ambient data for one simulation can exceed 100 MB, more common
sizes are in the range between 5-30 MB. Not really something you
want to broadcast repeatedly when there are cheaper alternatives.
You'd also unnecessarily flood all the channels of your switch
for hours at a time.

No, broadcasting the whole thing would definitely not be the way to go.

With the TCP approach, the full data will be transferred to each
client exactly once, and all the clients will send the full set
to the server once collectively. This is the same amount of
network traffic that is generated now, when you have the ambient
file sitting on a NFS mounted volume.

I think you may be underestimating the problem of recovery from
dropped connections, downed processors, and so on.

Randolph

···

On Thu, Jan 30, 2003 at 05:42:02PM -0500, Georg Mischler wrote:

Randolph Fritz wrote:

It's more the network and processor overhead of TCP, though the
limited number of file descriptors was once a problem.

I understand that TCP has a much higher overhead than UDP. But is
this really significant compared to the processing that we
already do with Radiance? Five times very little may still not
be much, after all...

I don't think
a single app could ever really manage 1024 simultaneous TCP
connections on most systems, except for the big IBM servers.

The sockets as such are not a problem, but the polling (even when
done by the OS through select()) will become a bit less efficient
eventually. This article talks about an example implemented in
Python on PC hardware:
  http://www.nightmare.com/medusa/async_tweaks.html

I think you may be underestimating the problem of recovery from
dropped connections, downed processors, and so on.

We're not handling life support systems here, are we? :wink:

The server doesn't really care about individual client
connections. Even if the server dies and restarts, it would be
possible for the clients to try to reconnect a few times within
reasonable intervals.

If we can assume a shared file system, there's a simpler way.

Have one ambient server app that writes the shared file. The server
receives data via UDP from the various renderers, and adds it to the
file. (Is conflicting data an issue? How best for the server to
resolve it?) Every time the renderers read the file they check to
make sure the items they sent are in it; if they're not, they resend
them.

Interesting idea. Now you see why I want to implement the server
in Python at first, so we can play with various concepts and see
which behaves best under load on all platforms.

Conflicting data are not a problem, at least not one that is
specific to this setup. Those are also written to the same file
by the traditional sharing method.

My initial reflex was to use the same (TCP) channel for both
directions, but you're right that this isn't really necessary.
The simulation jobs need file system access to lots of common
data anyway, so there's no reason why the ambient server should
be excluded from that. That way, the reading channel is secured
against packet loss by NFS.

This leaves us with the choice of UDP or TCP for collecting the
data. The final solution will have to balance the overhead of TCP
against the inherent unreliability of UDP.

If an UDP packet with ambient data for storage is dropped, the
consequence is simply that those spefic values will be missing in
the file. Another process that later checks for them will not find
anything, and therefore has to do the same calculation again, or
at least a very similar one. This means that any dropped packets
will cause the overall computation time to go up, whether they
are caused by network congestion or by the server temporarily
going offline. We'll have to test what happens when increasing
numbers of packets get dropped (1%, 5%, etc.), but I don't expect
any problems besides the wasted time (Greg?).

If we use TCP, then we know that no packets are lost, or rather
we'll know when it happens. The clients will first block if the
server doesn't answer, and after a while the connection may be
declared dead. This would give us some primitive built-in
monitoring right at the place where this information is needed,
without introducing yet another process.

I don't want to make any predictions at the moment about which
method will me more efficient in practise. It won't take much to
implement both of them, and then we'll have to ask people to run
lots of tests under heavy load.

  - I recommend that renderers keep a timer, and if the server does
    not see ambient file updates in some operator-determined period of
    time, raise an alarm. (The operator-determined time because it
    depends on network and filesystem latency.)

I'm not sure what problem you're trying to solve here. Ambient
data updates can get rather sparse towards the end of a
simulation anyway, so that their frequency isn't really a good
indicator of anything.

  - I believe the ambient updater itself could be best organized as a
    monitor and an updater process;

I don't think that job management concepts are within the
functional scope of core Radiance. Stuff like that can be easily
added from other sources.

  - The updater can publish its UDP port and a 64-bit random magic

Let's worry about security once the functionality as such works.
It will be trivial to have the server only accept packets from
the local IP range, and you should have a firewall in front of
your network in any case.

-schorsch

···

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

Hi Guys,

This whole conversation is getting way too complicated for my tastes. Can't we find a simpler solution? The whole client/server model sounds really nasty -- who starts the server? What happens if the server dies or gets overwhelmed? How portable will it be between architectures? All these things make me nervous.

What if we try to stay closer to the current model, just modifying it so it doesn't depend on an NFS lock manager. Here's what I suggest:

1) Instead of calling fcntl with F_SETLKW, each ambient process periodically checks for the existence of a lock file on the NFS filesystem (named after the ambient file perhaps with an added suffix ".lok").

2a) If the lock file exists, we continue rendering until a later checkpoint.

2b) If the lock file doesn't exist, we create it with open(...O_EXCL). We then have full control over the ambient file to read in the new values from other processes and write our values to it. Afterwards, we remove the lock file.

The only difference between this and what's currently in ambient.c is that we use the lock file with open() and unlink() to control access, and buffer as many values as necessary until the lock is available. (Right now, the process will block after a certain accumulation and not unblock until it obtains the lock.)

-Greg

P.S. I took some flack for my apparent lack of knowledge regarding C pointer arithmetic after the last post. So I screwed up! At least I eventually figured out that I screwed up -- don't I get some credit for that? I didn't think the error was so obvious, myself.

P.P.S.

char **avp1, **avp2;

I hope you're using pointers to void in the ANSI version... :wink:

-schorsch

You may be disappointed when you first see the new ANSI version. My goal was to get the libraries to a state where other people could call them with the benefits of function prototypes. I didn't go into all the various modules and programs and make sure that the local functions all had prototypes as well. Only the code in the src/common and src/rt directories got treated -- I pretty well left the rest alone, except to make sure it was happy with the new library prototypes, and didn't go declaring standard library functions, itself. Internal static functions pretty much never got prototyped, as gcc (and other C compilers) don't pick on you if you use the old K&R style of function definitions:

type call(a1, a1) type a1, a2; {...}

Thus, the comparison function passed to qsort doesn't get its parameters examined. If someone wants to go in and add all the correct parameter lists and casts everywhere, they're welcomed to do so once we get CVS up and running. Frankly, I think it's a waste of time that could be better spent, and it only increases the chances of errors. I found exactly 0 bugs in Radiance by adding the prototypes I added, and caused two or three in the process, which took me a while to ferret out....

Greg Ward wrote:

This whole conversation is getting way too complicated for my tastes.
Can't we find a simpler solution? The whole client/server model sounds
really nasty --

At this time I'm still convinced that it's the most reliable and
portable way to solve the problem at hand.

who starts the server?

Who starts simulations on remote systems as it is now?
As long as your processes all run on the same machine, no ambient
server is necessary, and eg. rpiece will continue to work just
fine. After all, we're not going to take the current file sharing
functionality away. We'll just experiment with additional options
that are less vulnerable to OS bugs and other platform issues.

Once you run jobs on more than one machine, you need to start
them manually (or through scripts/other tools) anyway, even if
you use rpiece. So on that front, nothing will really change.

Thinking of it, the server might also be useful to coordinate
the rpiece processes, removing yet another NFS lock dependency.

What happens if the server dies or gets overwhelmed?

Probably the same that happens now when the NFS server reboots
or the network clogs. The individual simulation processes will
stall, until the server is available again.

How portable will it be between architectures?
All these things make me nervous.

Sockets are fully portable across all platforms that Radiance
currently supports, and then some.

What if we try to stay closer to the current model, just modifying it
so it doesn't depend on an NFS lock manager. Here's what I suggest:

1) Instead of calling fcntl with F_SETLKW, each ambient process
periodically checks for the existence of a lock file on the NFS
filesystem (named after the ambient file perhaps with an added suffix
".lok").

And you think sockets are nasty?

One of the few things that I know about lockfiles is that they're
not exactly simple to get right.

- What happens if the lockfile owner gets killed?
  Will the others be able to figure out that the lock file is
  stale and override it? Or will that require human intervention?
  I think that NFS locks get purged when the owner dies, so we
  might lose quite some convenience here.

- What happens process b checks for the file after process a
  does, but before process a actually creates the new file?

There's no way to make checking and creating an atomic operation,
so that this situation must be handled explicitly and gracefully.
In a big simulation, it may well be that a dozen processes are
competing for the lock file several times a second for hours or
even days. Race conditions *will* happen.

I'm not saying that it can't be done. But in the best case, I'd
expect a foolprof solution to be similarly involved and complex,
but less portable as using a server process.

Since I'm familiar with client/server concepts, I'm willing to
implement one. If anyone volunteers to design a reliable file
based solution, I'm certainly not objecting to having that
available as an alternative option. In fact, I'd be happy to see
as many synchronization methods as possible implemented, so we
can test them all. And after we know what works, we can leave the
best two or three in the core distribution, for the user to
select the one that is most appropriate in their environment.

P.S. I took some flack for my apparent lack of knowledge regarding C
pointer arithmetic after the last post. So I screwed up! At least I
eventually figured out that I screwed up -- don't I get some credit for
that? I didn't think the error was so obvious, myself.

It was certainly one of a rare breed. Not many people dare to
juggle with pointers as you do... :wink:

  I didn't go into all
the various modules and programs and make sure that the local functions
all had prototypes as well.

That's probably the way I would have started as well. I didn't
really expect you to declare victory so soon. Good to know that
you're not superhuman either... :wink:

If someone wants to go in and add all the correct
parameter lists and casts everywhere, they're welcomed to do so once we
get CVS up and running.

Compiling on Windows (at least with VC) is a pain without
prototypes, as the real errors get drowned in all the warnings.
So I may end up converting them along the way when trying to
establish cross-platform compatibility.

I remember now, I did find one bug during ANSI-fication

One down, n to go... :wink:
If I understand you correctly, then you only converted a few
percent of the code, so that finding more than one or two would
have been quite a surprise.

-schorsch

···

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/