Radiance on the new Apple M1 processors

Lars_Grobe · May 12, 2021, 10:24am

Dear Radiance folks,

finally, I managed to get my Macbook that had been waiting since early January and can share my first experiences with Radiance on this platform.

INSTALLATION

Some of this is generic, but it may be helpful for others who have not built Radiance from source before.

To get decent support for the new processor, I am relying on Apple’s clang rather than gcc here, so I first installed the Developer Tools with their Command Line Tools. To get X11 and OpenGL support, I installed macports, and within macports the packages xorg, libGLU, and mesa (I hope I did not miss a package here). I then added the following line to my ~/.zprofile:

export CPATH=xcrun --show-sdk-path/usr/include:$CPATH

This tells the compiler to look for include files not in the (unix-default) /usr tree, but in the software development kit (basically this allows to build for different environments if more than one SDF is installed).

I recommend to log out and back in at this point, to ensure that xorg is functional and the CPATH is set. Now the compiler environment should be set up properly.

I keep self-compiled software (that is not updated by some package manager) in my home directory, e.g. sources in ~/src and binaries in ~/opt. I download the Radiance head release and the library files to ~/src/radiance and uncompress them there, giving me a source tree ~/src/radiance/ray. I also get the latest libtiff sources from osgeo.org (I tool 4.3.0) and uncompress them to ~/src/tiff-4.3.0. I delete ~/src/radiance/ray/src/px/tiff and replace it by a symlink to my libtiff (in ~/src/radiance/ray/src/px I type ln -s ~/src/tiff-4.3.0 tiff).

Now, in the Radiance source tree ~/src/radiance/ray, I first type ./makeall clean to make sure that no built artefacts are left. I then start the build process typing ./makeall install, go through the license and agree to it. When asked for the editor of my choice, I type nano (just a habit). The destinations for binaries and libraries / auxiliary files are ~/opt/radiance/bin and ~/opt/radiance/lib respectively. When the makeall scripts display the built commands and asks wether I want to modify it, I enter “y” and adjust it:

Second line: I add “-j 4” behind “make”. This tells make to start 4 processes in parallel, accelerating the building of the binaries.

Third line: I replace “-O2” by “-Ofast”.

Fourth line should read: “MACH=-DBSD -DNOSTEREO -Dfreebsd -mmacosx-version-min=11.2 -I/opt/local/include -L/opt/local/lib” \

I am not sure why I need to specify the minimum version of macos here, and this line may have to be adjusted. My guess is that it is to support linking against the macports binaries.

I save pressing Ctrl-O and leave by Ctrl-X. Now the binaries will be built (you will see the effect of the -j 4 option now, this goes really fast ), and you should get a final “Done.” before the script ends.

Now, I add the following two lines to ~/.zprofile:

export PATH=$PATH:~/opt/radiance/bin

export RAYPATH=$RAYPATH:/opt/radiance/lib

I also comment out the line added by macports to set the display variable here since from my experience it causes trouble and is not necessary.

BENCHMARK

To test the performance of the new machine, I get Mark Stock’s “bench4” benchmark from github (GitHub - markstock/Radiance-Benchmark4: A well-used benchmark scene for the Radiance pseudo-radiosity renderer). Knowing the the current M1 has 4 “performance” and 4 “efficiency” cores, I am doing 3 tests. One with one process (1 proc, “make”), one keeping all performance cores busy (4 proc, “NCPU=4 make smp”), and one including the efficiency cores (8 proc, “NCPU=8 make smp”). The results are:

1 proc: 503 sec (best so far: Ryzen 9 3950X with 593 sec)

4 proc: 143 sec (best so far: Ryzen 9 3950X with 186 sec)

8 proc: 118 sec (best: Ryzen 9 3950X with 114 sec)

If you compare this to the existing entries at the benchmark web-pages, the little laptop appears to achieve by far the highest performance per “performance” core - and is competitive even as an 8-core machine despite the lower performance of the “efficiency” cores. Fun fact: The fan was not even running during the benchmarks.

CONCLUSION

The M1-based devices, despite their very low energy demand, give astonishing performance based on their “performance” cores. The cores outperform most (all?) available x64-cores even on workstations with significantly higher power demand. As can be expected, the “efficiency” cores cannot compete with this, but even in the case when all 8 cores (including the weaker “efficiency” cores) are included, the overall result is similar to a recent 8-core x64 system.

Since the currently availably systems are very compact, I would personally not use them as computing nodes - but for those of us who are doing their everyday work on a laptop on battery or a quiet desktop, and want enough power to run a simulation from time to time, the platform offers great performance. Since this is a rather new cpu, I expect better optimization in compilers in the near future. I is also important to note that this is the first generation of the platform with a clear focus on energy demand, while configurations tuned for computational performance are yet to come.

I hope this is somehow helpful and not too much an advertising report I am also curious to see the next generations of Arm64 systems to come. Samsung seams to be working on something now, and testing the Arm64-systems in use by the big cloud providers might be worth a try for those in need of scalability.

Best, Lars.

Rob_Guglielmetti · May 12, 2021, 12:19pm

Thanks, Lars! This is definitely intriguing, and helpful. Still plenty of
unknowns regarding compatibility with other elements of the tools we need
to use day-to-day, but great to see Radiance running on the hardware (and
love the fan-free-yet-blazing performance anecdote!).

Rob

Randolph_Fritz4 · May 12, 2021, 9:40pm

There is now XQuartz support on Apple M1 Macs, for people who need the X11 apps. You can download it at XQuartz - Releases.

Rob_Guglielmetti · June 15, 2021, 9:18pm

Hey @Lars_Grobe, any more insights with Radiance and the new M1 Mac? I’m in the market for a new laptop, as my personal lappie is NINE years old this month, and I am wondering if it’s now “safe” to get an M1-based dealio. I was thinking it would be great to play around with it and get some Radiance packages together for the community. And given that my last Mac lasted nine years, I feel like getting another one for the long haul.

Lars_Grobe · June 16, 2021, 10:33am

Hi @Rob_Guglielmetti - I am perfectly happy with mine so far. If you
need something with more cores, it may be worth waiting for the next
configurations of the processor, but for having a mobile, lightweight
yet powerful laptop there is little to argue against what is available.

Best, Lars.

Rob_Guglielmetti · June 17, 2021, 12:32am

Thanks, Lars! Yes, I see they are planning a new round of M1 MacBook Pros with something like 4x the cores now?! I’ll keep an eye out for these, as well.

John_Mardaljevic · October 2, 2021, 1:56pm

A few quick tests running Radiance on a new M1 iMac. As expected, very similar to Lars’ M1 Macbook.

Scene: a fairly detailed chunk of a city model (~288 Mb octree). Elapsed time and %cpu usage shown. Single and 8 core results shown.

iMac 5K 27" 2014, 4 GHz Intel Core i7, 32 Gb RAM (Samsung 2TB SSD)
macOS 10.14.6

Radiance v5.2

time rpict -vf 1.vf -ad 4096 -ab 2 test.oct > /dev/null
15:16.65 99.9%

time rtpict -n 8 -vf 1.vf -ad 4096 -ab 2 -af test-8.af test.oct > /dev/null
4:06.62 785.8%

15:16.65/4:06.62 = 3.7

=================================================

iMac 24" M1, 2021, 16 GB RAM (Apple 1TB SSD)
macOS 11.6

Radiance v5.2 x86_64 (copied over from Intel iMac, i.e. via Rosetta 2)

time rpict -vf 1.vf -ad 4096 -ab 2 test.oct > /dev/null
12:25.49 99.8%

time rtpict -n 8 -vf 1.vf -ad 4096 -ab 2 -af test-8i.af -w test.oct > /dev/null
2:43.31 774.5%

12:25.49/2:43.31 = 4.6

-------------------------------------------------

Radiance v5.4a arm64 (compiled on M1)

time rpict -vf 1.vf -ad 4096 -ab 2 test.oct > /dev/null
10:12.52 100.0%

time rtpict -n 8 -vf 1.vf -ad 4096 -ab 2 -af test-8a.af -w test.oct > /dev/null
2:15.29 772.7%

10:12.52/2:15.29 = 4.5

=================================================

Observations:

The 2014 iMac is still a very capable desktop. It was given a new lease of life when the fusion drive was replaced with a SSD (Samsung 860 EVO 2TB). I’ll be sticking with it (and Mojave) for the next few years, unless there’s a major failure which isn’t worthwhile repairing. All the same, the entry level M1 iMac is evidently a powerful number cruncher. Note, the 2014 iMac was top of the range at the time (and I maxed out the CPU spec and RAM capacity, both of which were well worth it). The expected (but not yet announced) higher-end iMacs with Apple silicon are likely to be bigger (32"?) and faster (M1X?).

These were one-off tests. But unless other processes are soaking the CPU (which wasn’t the case), I’d expect repeatability. So, maybe the small hike in the ratio of the 1 vs 8 core times for the M1 vs Intel speeds is real.

Cheers
John

PS. It’s a green one…

Daylight Experts Ltd.
Expert Witness | Simulation | Measurement | Conservation

Associate Editor Lighting Research & Technology

Stephen_Wasilewski · February 28, 2023, 1:32pm

Update on this for those who are interested.

I just got a new 14" M2 pro (8 performance + 4 efficiency cores). I ran Mark Stock’s benchmarks (Radiance Benchmark Test) using both the LBNL precompiled binaries (Release Radiance 5.4a (2023-02-26) · LBNL-ETA/Radiance · GitHub) and binaries compiled according to Lars’ instructions:

Note that because of the discourse auto-formatting, Lars’ instructions got a little garbled:

the line in .zprofile should be:

export CPATH=`xcrun --show-sdk-path`/include:$CPATH

the first 4 lines of the rmake file that worked for me are:

#!/bin/sh
exec make -j 12 "SPECIAL=ogl" \
	"OPT=-Ofast" \
	"MACH=-DBSD -DNOSTEREO -Dfreebsd -mmacosx-version-min=11.2 -I/opt/local/include -L/opt/local/lib" \

my results:

precompiled (Radiance 5.4a (2023-02-26) e0d019e):

Serial: 747.68
smp (NCPU=12): 109.49

Local Compile (same version, clang-1400.0.29.202, target: arm64-apple-darwin22.3.0, flags: -Ofast)

Serial: 475.40
smp (NCPU=12): 80.05
smp (NCPU=8): 83.48

Three conclusions:

you need binaries compiled for the new macs to get the expected performance.
This little mac is on par with the AMD Ryzen 9 3950X atop the benchmark list (with half the cores).
The serial calculation shows incremental improvement over the M1, the big difference is the higher CPU count.