OS X G5 performace [was: No more Mr. Nice Guy]

This is a correction for my own whining about poor Radiance
perfomance on my iMac G5 according to the bench4 test at

     http://mark.technolope.org/pages/rad_bench.html

I completed my tests on my G4 PowerBook, and the difference is even greater
than with the G5 desktop. These are the results:

Radiance Speed Comparisons
  Highest Automatic Reduced
G5, nice=0 90.9 91.1 182.2
G5, nice=+6 90.9 166.1 182.4
G4, nice=0 182.8 184.6 366.7
G4, nice=+6 184.3 365.6 367.7

I've recompiled my binaries with the new source code and heavy optimizing
of compiler options. With processor performace at "Highest" I get the
following result for Mark's benchmark test (details for Mark and other
compiler switch freaks out there):

rpict time: 6403.98 user
processor: G5
num procs: 1
clock speed: 2 GHz
cache: 512 kB
OS: OS X Tiger 10.4.3
Radiance vers: 3.8a Patch 2 (?)
compiler: GCC 4.0.0 20041026 (build 4061)
rays: 693777405
options: -O3 -fgcse-sm -funroll-loops -fstrict-aliasing
                 -fsched-interblock
                 -falign-loops=16 -falign-jumps=16 -falign-functions=16
                 -falign-jumps-max-skip=15 -falign-loops-max-skip=15
                 -ffast-math -freorder-blocks -freorder-blocks-and-partition
                 -finline-floor -mdynamic-no-pic -mpowerpc-gpopt
                 -force_cpusubtype_ALL -mtune=G5 -mcpu=G5 -mpowerpc64

Note: These options are equal to the "-fast" optimization switch of the gcc
       for G5 except for one missing "-malign-natural" option which breaks
       rpict and probably a lot of other programs.

Finally this iMac is up to the Athlon XP on Mark's page and acceptable
for a real "desktop" machine where fan noise is much more important than
render speed.

Happy holiday season and i hope everyone gets a PowerMac G5 Quad under
their christmas tree,

Thomas

···

On 16.12.2005, at 22:10, Greg Ward wrote:

Hi Thomas,

Thanks for the hint as to what options to set on the G5! I had played around with these some time ago -- using gcc 3.3, I think -- and never found a combination that both sped up the process and avoided problems. I'm sure I tried removing one at a time, but not with the latest compiler. Indeed, this combo seems to be a winner, at least for the G5 -- about 20% faster than -O3 by itself.

Running Mark Stock's bench4 on the HEAD release on my 2.5GHz G5, I get a user time total of 4956.56 seconds, which is maybe a few percent better than what one might expect just from the clock speed difference.

I also ran the new smp test, and got about 95% linearity (a 3.8 times) speedup using all four processors. This was good news to me, as I had some concern that the dual core processors would have a bottleneck with memory access. Apparently, the enlarged caches deal with this pretty gracefully, and the fact that rpiece shares memory can't hurt. It's too bad Apple is going with Intel instead of AMD -- I might have waiting if that had been the case. As it is, I don't think they'll better the G5 quad for some time.

-Greg

···

From: Thomas Bleicher <[email protected]>
Date: December 18, 2005 7:39:15 AM PST
...
I've recompiled my binaries with the new source code and heavy optimizing
of compiler options. With processor performace at "Highest" I get the
following result for Mark's benchmark test (details for Mark and other
compiler switch freaks out there):

rpict time: 6403.98 user
processor: G5
num procs: 1
clock speed: 2 GHz
cache: 512 kB
OS: OS X Tiger 10.4.3
Radiance vers: 3.8a Patch 2 (?)
compiler: GCC 4.0.0 20041026 (build 4061)
rays: 693777405
options: -O3 -fgcse-sm -funroll-loops -fstrict-aliasing
                -fsched-interblock
                -falign-loops=16 -falign-jumps=16 -falign-functions=16
                -falign-jumps-max-skip=15 -falign-loops-max-skip=15
                -ffast-math -freorder-blocks -freorder-blocks-and-partition
                -finline-floor -mdynamic-no-pic -mpowerpc-gpopt
                -force_cpusubtype_ALL -mtune=G5 -mcpu=G5 -mpowerpc64

Note: These options are equal to the "-fast" optimization switch of the gcc
      for G5 except for one missing "-malign-natural" option which breaks
      rpict and probably a lot of other programs.

Finally this iMac is up to the Athlon XP on Mark's page and acceptable
for a real "desktop" machine where fan noise is much more important than
render speed.

Happy holiday season and i hope everyone gets a PowerMac G5 Quad under
their christmas tree,

Thomas