Linux,multi-cpu,NFS,Mosix,PVM,MPI

Peter_Apian-Bennewi1 · November 22, 2005, 11:10pm

Dear Nick,

following your question about Radiance on multi-cpu systems:
You've probably found the many postings on radiance-online (search for 'mosix' or 'parallel'). My two-cent thoughts on this from my experiences over the years at an institute (50 Linux machines) and at my office (6 machines):

    * Having the CPUs on one board is a lot easier than over the network
      dual Pentium or AMDs are easy to get and relatively cheap, quad
      boards are a bit more expensive, but might be worth it if you're
      really digging into it.
      Sharing the scene and ambient data in RAM is build into rpict
      (rather crude and simple, but it works). Both animations and large
      images use the CPUs in parallel. Parametric studies with different
      octrees require multiple copies of the geometry, of course, but I
      haven't found a scene which is limited by RAM yet (typically other
      limits like ambient rendering times, function files, etc. limit
      rendering before that).
      With a few GB of RAM, RAID-1 or RAID-5 disks, a dual power supply
      and an UPS, the machine constitutes a reliable and fast
      production system.
    * Distributing over the network via NFS works too
      http://www.ise.fhg.de/alt-aber-aktiv/radiance/animation/ shows an
      animation from my Fhg-ISE days and was rendered on the 50 or so
      Linux machines we had at that time. They shared all data via NFS,
      which works, contrary to an apparent common subliminal feeling.
      (well, you have to use nfs-3 and the kernel NFS server on Linux to
      get file locking, which is essential for the sharing of ambient
      data). Distribution across the machines was done by a small but
      effective and failsafe job distribution system I wrote.
      We had tried Open-Mosix, but never used it for productions. The
      main drawback in my view is that the process 'swims' between
      non-homogeneous machines and this adds an extra layer to keep
      track of. E.g. rpict's logfile will not tell where the process has
      run or is running. In case some machine has faulty hardware it
      gets increasingly hard to track, and that's not really what one
      wants or needs when rendering a few thousand images under time
      constraints. Your mileage may vary, maybe folks out there do use
      Mosix happily for production now.
    * PVM (parallel virtual machine) is a library and system to
      distributed parts of a program across machines (MPI, message
      parsing interface is similar, with different concepts). It
      requires modifying the source (as far as I recall from the times
      when porting to LBNL's Cray-T3 had been a pending idea). Carsten
      Bauer ported Radiance to PVM and others checked out MPI/PVM as
      early as 1997 (see
      http://radsite.lbl.gov/radiance/pub/digest/no_Z/v3n2 and seach for
      PVM).
      If anyone uses an MPI/PVM enhanced Radiance for commercial or
      research grade production I (and others) would be delighted and
      enlighted to hear about.

open to all new facts and insights-
cheers
Peter

···

--
pab-opto, Freiburg, Germany, http://www.pab-opto.de
[see web page to check digital email signature]