sharing indirect values for parallel processing?

Hi Lars,

Over a year ago, there was quite a bit of talk on the dev list about supporting ambient value sharing through Samba or coming up with some workaround for busted NFS lock managers, but nothing was ever done about it. People on this list can tell you better than I what versions of Linux or NFS you need to look out for. In general, I think OS X and FreeBSD are solid with respect to NFS file locking. Solaris might be good as well.

-Greg

···

From: "Lars O. Grobe" <[email protected]>
Date: February 2, 2005 7:40:05 AM PST

I am going to start some large renderings during the next weeks, and I would like to have a little park of rendering machines available by the time. Now, parallel processing has been a topic here for some time, and I am wondering about the changes in the recent developments.

I know there is the new -N switch of rad, which will distribute renderings to multiple processes, right? So this seams to be the frontend to use, maybe better than starting all the rpict & Co.s manually. What I would really like to know what file sharing mechanism should I use for indirect values? It used to be nfs, and the documentation in the radiance distribution still reflects how to use nfs and lcking to share data between processes. Would the same be possible e.g. with samba? I am asking because I have a working samba 3 server available, offering enough space and lots of bandwidth. If samba is not supported, I could still install a nfs server, however, I remember some trouble with different nfs-locking mechanisms and the os-dependance of nfs implementations (we use OS X, Linux and Solaris here...;-).

So if there is any news about this topic, maybe some reports from people who use radiance in parallel environments, it would be of great help.

TIA+CU, Lars.

Christopher Kings-Lynne wrote:

> Over a year ago, there was quite a bit of talk on the dev list about
> supporting ambient value sharing through Samba or coming up with some
> workaround for busted NFS lock managers, but nothing was ever done about
> it. People on this list can tell you better than I what versions of
> Linux or NFS you need to look out for. In general, I think OS X and
> FreeBSD are solid with respect to NFS file locking. Solaris might be
> good as well.

Have you seen memcached?

http://www.danga.com/memcached/

That's interesting as a concept, but doesn't really solve our
problem. Memcached accelerates in-memory caching of data.
Radiance needs to coordinate access to a file on disk, where
performance is important, but not as critical as resolving access
conflicts.

The most straightforward solution to our problem would probably
be to use lock files, as Greg suggested in earlier discussions.
Unfortunately nobody has found the time yet to actually implement
that. If anyone wants to volunteer, please move the discussion of
your proposal to the dev-list.

-schorsch

···

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

The most straightforward solution to our problem would probably
be to use lock files, as Greg suggested in earlier discussions.
Unfortunately nobody has found the time yet to actually implement
that. If anyone wants to volunteer, please move the discussion of
your proposal to the dev-list.

Hi,

as I won't be able to help on the implementation, I won't bring this to the dev-list for now :wink: However, I guess the only needed feature of the shared fs used is a working byte range locking, right? So I will find out if the fs provided by openmosix (mfs) has this feature, which would make a set of mosix nodes a great radiance installation. I don't really want to spend hours on a working nfs setup to share just ONE file :wink: at least if it is not really necessary.

CU Lars.

Lars O. Grobe wrote:

> The most straightforward solution to our problem would probably
> be to use lock files, as Greg suggested in earlier discussions.
> Unfortunately nobody has found the time yet to actually implement
> that. If anyone wants to volunteer, please move the discussion of
> your proposal to the dev-list.

Hi,

as I won't be able to help on the implementation, I won't bring this to
the dev-list for now :wink: However, I guess the only needed feature of
the shared fs used is a working byte range locking, right? So I will
find out if the fs provided by openmosix (mfs) has this feature, which
would make a set of mosix nodes a great radiance installation.

Ambient files are only written at the end, so file locking
and byte range locking have the same effect.
We also need a solution that works on all platforms and on
all file systems. Requiring third party software just to get
reliable file sharing is clearly out of the question.

-schorsch

···

--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/

Hi all interested,

Cross posting to dev as this is probably a more appropriate space for this conversation.

OK, let's see who I can irritate the most...

As a refresher, there have been numerous threads on this topic on Radiance Dev (in no order other than my searching through my mail):

    * Before we give up on lock files...
    * multiprocessor systems, Radiance and you
    * as well as others if you want to delve in to the depths of the pre
      radiance-online mailing list archives

In general, I recall that there are a couple of directions to go:

    * network filesystem locking - such as NFS or Samba, where we are
      dependant on either the locking mechanism actually working (eg
      NFS) or the filesystem (Samba) being installed
    * client/server - probably more hairy from a implementation
      standpoint as well as from a porting point of view. Although,
      perhaps guaranteeing the best performance for selected os'?

Not to rehash old stuff, but could one of the more knowledgeable developers (Greg, Georg, Peter, Carsten...?) give us a refresher on what the options are and perhaps some idea of the time that would be needed to implement a workings solution? Locking is a recurring problem. It would be nice to figure a consensus solution (ie what direction to pursue) and then a strategy for implementation (ie resources, person(s), money...), so perhaps we as a community could figure out how to move this forward (if as always there is enough interest).

I must admit to having run into this wall on a variety of occasions. NFS (v3) on linux is "supposed" to lock correctly (sync mode on the mount/fstab), as a test there is a test suite from Sun (www.connectathon.org) that is supposed to test the nfs server. I remember running this test suite in the past and getting positive results on linux. Nevertheless, I have found it extremely difficult to get working results with a networked image render (eg rpiece distributed over multiple cpu nodes). Either there end up being problems with ambient values between image cells and/or with locking of the syncfile for distributing image cells to different machines. I even implemented a client/server in perl at one point to try to fight this problem with the syncfile (with partial success as I recall and perhaps more if my time would allow). Not to cause offense... But is it possible that the locking code in Radiance needs to be checked itself?

In brief follow-up to Lar's comments about openmosix/mosix. As understand it the msf filesystem, is supposed to implement locking correctly. There are also other more sophisticated network filesystems such as GFS (Systina, I think and commercial), OpenGFS and many others. However these all require separate special install and perhaps modification of the kernel or installation of a modified kernel, and there is serious question as too whether these are portable to other os's such as MS version whatever (as the main offender of portability).

Note also that I tried openmosix at one point. One problem that I found is that if you start multiple large (eg memory size) jobs on the master node then this can lead to excessive paging and since the master node tries to start the jobs at the same time into its own memory space prior to migrating them off to other nodes in the cluster. So if your job requires 1 Gig of memory to hold the scene and you want to run 10 jobs on 5 dual processor nodes with each node having 2 Gig of memory, if you start all the jobs on one node then you are hosed. If you start them on individual nodes, then you should be using a different clustering solution since this completely negates the value of the migration algorithms in openmosix. Now it has been a while since I used OpenMosix, so perhaps things are different...

Note also that named pipes do not work (at least back in mid 2003, you can see my brief inquiry to the openmosix list and Moshe Bar's even briefer reply back in April of 2003) on OpenMosix. So if you want to do memory sharing on multiprocessor nodes you have to roll your own batch job distributor.

-Jack de Valpine

Georg Mischler wrote:

···

Lars O. Grobe wrote:

The most straightforward solution to our problem would probably
be to use lock files, as Greg suggested in earlier discussions.
Unfortunately nobody has found the time yet to actually implement
that. If anyone wants to volunteer, please move the discussion of
your proposal to the dev-list.
     

Hi,

as I won't be able to help on the implementation, I won't bring this to
the dev-list for now :wink: However, I guess the only needed feature of
the shared fs used is a working byte range locking, right? So I will
find out if the fs provided by openmosix (mfs) has this feature, which
would make a set of mosix nodes a great radiance installation.
   
Ambient files are only written at the end, so file locking
and byte range locking have the same effect.
We also need a solution that works on all platforms and on
all file systems. Requiring third party software just to get
reliable file sharing is clearly out of the question.

-schorsch