Mesh Rendering Performance

Greg,

You mentioned that the Moller-Trumbore method required two divisions. For the code that I implemented, I only needed one division at the end to determine the ray length. Here is the page for various versions of the source code.

http://www.cs.lth.se/home/Tomas_Akenine_Moller/raytri/raytri.c

I used the one without early division. If I am correct, this code without early division supports two sided objects.

Regards,

Marcus

Hi Marcus,

I believe the version with one division is when you have stored the surface normal, which is fine for normal polygons in Radiance, but not for the mesh code. For the mesh code, I would be using the original version, which doesn't require any additional storage.

-Greg

···

From: "Marcus Jacobs" <[email protected]>
Date: February 27, 2006 4:54:01 PM PST

Greg,

You mentioned that the Moller-Trumbore method required two divisions. For the code that I implemented, I only needed one division at the end to determine the ray length. Here is the page for various versions of the source code.

http://www.cs.lth.se/home/Tomas_Akenine_Moller/raytri/raytri.c

I used the one without early division. If I am correct, this code without early division supports two sided objects.

Regards,

Marcus

P.S. Actually, you're right. I should have counted the two multiplications by the same reciprocal as a single divide.

Sorry.
-Greg

···

From: Greg Ward <[email protected]>
Date: February 27, 2006 5:06:43 PM PST

Hi Marcus,

I believe the version with one division is when you have stored the surface normal, which is fine for normal polygons in Radiance, but not for the mesh code. For the mesh code, I would be using the original version, which doesn't require any additional storage.

-Greg

From: "Marcus Jacobs" <[email protected]>
Date: February 27, 2006 4:54:01 PM PST

Greg,

You mentioned that the Moller-Trumbore method required two divisions. For the code that I implemented, I only needed one division at the end to determine the ray length. Here is the page for various versions of the source code.

http://www.cs.lth.se/home/Tomas_Akenine_Moller/raytri/raytri.c

I used the one without early division. If I am correct, this code without early division supports two sided objects.

Regards,

Marcus

Dear Group

I have been using obj2mesh to convert my 3D scene to the Radiance triangular mesh (rtm) format for the last 6-8 months. Although I do prefer using this method to convert my scenes, a recent test confirmed that the performance suffers greatly when using a mesh when compared to using a scene consisting of polygonal objects (i.e. void polygon triangle1 ......) . Case in point, one rendering took 8.7 for the scene consisting of a mesh primitive and 3.7 hours to render for the scene consisting of triangular polygon primitives. This is with the -n option set to 6, which should yield the best performance (actually, its about the same whether I leave -n to the default or if I use a lower value). A slight performance decrease (5%-10%) would be acceptable but 135% is extreme. On top of that, I have some modified source code for o_face that can yield a 10%-20%+ performance increase without any additional memory cost (It's just an implementation of Moller-Trumbore ray/triangle intersection). What is surprising to me is that I would normally assume that the mesh primitive would be more efficient due to its caching of the ray/edge comparisons during ray tracing but this does not seem to be the case. Out of curiosity, has anyone here experienced the same? Does anyone here know if Radiance sees the rtm file as one huge mesh or as several smaller meshes? If this is the case, does anyone know if breaking each scene object as a separate mesh primitive would improve performance?

Thanks

Marcus

Hi Marcus,

It doesn't surprise me that much that the mesh intersection routines are slower. Many more operations are required to compute the ray-triangle intersection when the plane equation is unknown. I chose not to store the plane equation because it would approximately triple the amount of memory used per triangle during rendering, and the whole point of the mesh primitive is to minimize memory use. It's a classic time/space trade-off. The performance would be even worse if it weren't for the edge caching, which takes only a modest amount of memory.

The other thing that costs is transforming the ray before and after intersection, similar to having an octree instance (which it is). Breaking the mesh into smaller pieces may or may not help matters, depending on how compact the mesh octree is. In other words, it's a good idea to use multiple object meshes if the combined mesh is spread mostly in one or two dimensions, or is very sparsely populated by geometry. That way, you can avoid many of the the voxel computations through empty space. However, just like octree instances, it doesn't pay to have many overlapping mesh bounding cubes.

I hope this helps.
-Greg

P.S. Meshes are faster in any case when you have local (u,v) texture coordinates.

···

From: "Marcus Jacobs" <[email protected]>
Date: February 24, 2006 1:18:58 PM PST

Dear Group

I have been using obj2mesh to convert my 3D scene to the Radiance triangular mesh (rtm) format for the last 6-8 months. Although I do prefer using this method to convert my scenes, a recent test confirmed that the performance suffers greatly when using a mesh when compared to using a scene consisting of polygonal objects (i.e. void polygon triangle1 ......) . Case in point, one rendering took 8.7 for the scene consisting of a mesh primitive and 3.7 hours to render for the scene consisting of triangular polygon primitives. This is with the -n option set to 6, which should yield the best performance (actually, its about the same whether I leave -n to the default or if I use a lower value). A slight performance decrease (5%-10%) would be acceptable but 135% is extreme. On top of that, I have some modified source code for o_face that can yield a 10%-20%+ performance increase without any additional memory cost (It's just an implementation of Moller-Trumbore ray/triangle intersection). What is surprising to me is that I would normally assume that the mesh primitive would be more efficient due to its caching of the ray/edge comparisons during ray tracing but this does not seem to be the case. Out of curiosity, has anyone here experienced the same? Does anyone here know if Radiance sees the rtm file as one huge mesh or as several smaller meshes? If this is the case, does anyone know if breaking each scene object as a separate mesh primitive would improve performance?

Thanks

Marcus

I spent a little time looking at the Moeller-Trumbore ray-triangle intersection routine. I could in principal apply their technique rather than the Segura-Feito method I'm using. In comparison, the M-T algorithm requires one less cross-product, but two more divides than S-F. Furthermore, M-T computes the Barycentric coordinates and intersection distance simultaneously, which I end up having to compute following a successful test. However, I wouldn't be able to cache edges like I am with S-F, which probably saves 40% of the computation time. Since I reject between 5 and 10 triangles for every intersection I end up computing, and only calculate Barycentric coordinates on at most one ray per mesh, my gut feeling is that the M-T algorithm would actually be slower if I were to replace the S-F algorithm I have in there. Of course, you are welcome to try it. There is also a fair amount of room for optimizing the code in there using assembler, but I never bother with that....

-Greg

Hi,

just tried out, replaces my meshes by obj2rad-oconv-generated instances. May be faster, but now my rpict-process takes 1.4 GB instead of 700 MB :wink: So I guess I will have to stay with those "slow" meshes...

CU Lars.

P.S.: Concerning optimizations: I think trying to introduce assembler into radiance code is a bad idea, as it breaks portability, a great strength of the current code. Still, I think one should think about if the code could not be written to be more vector-friendly. Than compilers could do the rest (auto-vectorization is even in gcc since 4.0), and those who want to try (like me :wink: could simple replace some functions. Some ideas would be like passing not only one ray, but ray clusters to the subfunctions, so that these could be implemented to do calculations in parallel. Another help would be to replace the unflexible datatypes by unions. I ran into trouble when I just wanted to Altivec some routines. As mat4, fvec etc are simply arrays, I have to get the data in a altivec-suitable format, which causes overhead. If they had been e.g. unions (containing nothing but an array), I would have simply changed this to be an union of array and vector. All unchanged functions would work as now, and still I could directly apply altivec code on data. Those who think altivec is out because Apple build x86: this year we might see the first cell-processor machines. And vector units are built into almost all platforms today.

Hi Greg, hi group.

The other thing that costs is transforming the ray before and after intersection, similar to having an octree instance (which it is).

Hm, as we are on that topic... why do we need an extra primitive for these precompiled meshes? It would be so nice to treat them just like instances. Obj2mesh is just some kind of a different oconv, that allows to get some information into the octree that would be lost if we used obj2rad, because radiance scene has no vocabulary for it, right?

CU Lars.

Hi Lars,

The only part that is common between a Radiance triangle mesh and an octree (as used for an instance primitive) is the octree data structure itself. The mesh is stored very differently from a set of Radiance primitives -- a quick look at src/common/mesh.h should convince you of that. I had to add a function pointer to the RAY structure in order that the specialized intersection routine would be called, interpreting the OBJECT id's as indices into the mesh patch list.

Other than the different primitive name, you can treat meshes just like instances. Is it so inconvenient to swap the primitive name? It was a bit of work to get them to look and behave the same... Now you want them to be the same?

I'm confused.
-Greg

···

From: Lars O. Grobe <[email protected]>
Date: February 26, 2006 3:39:29 AM PST

Hi Greg, hi group.

The other thing that costs is transforming the ray before and after intersection, similar to having an octree instance (which it is).

Hm, as we are on that topic... why do we need an extra primitive for these precompiled meshes? It would be so nice to treat them just like instances. Obj2mesh is just some kind of a different oconv, that allows to get some information into the octree that would be lost if we used obj2rad, because radiance scene has no vocabulary for it, right?

CU Lars.

Hi Lars,

I don't know quite understand how changing the struct's to union's would help in vectorizing the code -- you'll have to give me an explicit example, and this is probably one for radiance-dev.

As for clustering rays, this is not as easy as it may sound. We would have to reorder the entire ray calculation in order to generate bundles of dissociated rays, as otherwise you naturally get a tree with many daughter rays generated for each parent diverging in all directions in your scene. Grouping daughters together you get almost no coherence, and techniques for getting good ray bundles are fairly complicated. (See for example, Matt Pharr, Craig Kolb, Reid Gershbein, and Pat Hanrahan, "Rendering Complex Scenes with Memory-Coherent Ray Tracing," Proc. SIGGRAPH 1997 <http://graphics.stanford.edu/papers/coherentrt/>.)

-Greg

···

From: Lars O. Grobe <[email protected]>
Date: February 26, 2006 1:38:23 AM PST

Hi,

just tried out, replaces my meshes by obj2rad-oconv-generated instances. May be faster, but now my rpict-process takes 1.4 GB instead of 700 MB :wink: So I guess I will have to stay with those "slow" meshes...

CU Lars.

P.S.: Concerning optimizations: I think trying to introduce assembler into radiance code is a bad idea, as it breaks portability, a great strength of the current code. Still, I think one should think about if the code could not be written to be more vector-friendly. Than compilers could do the rest (auto-vectorization is even in gcc since 4.0), and those who want to try (like me :wink: could simple replace some functions. Some ideas would be like passing not only one ray, but ray clusters to the subfunctions, so that these could be implemented to do calculations in parallel. Another help would be to replace the unflexible datatypes by unions. I ran into trouble when I just wanted to Altivec some routines. As mat4, fvec etc are simply arrays, I have to get the data in a altivec-suitable format, which causes overhead. If they had been e.g. unions (containing nothing but an array), I would have simply changed this to be an union of array and vector. All unchanged functions would work as now, and still I could directly apply altivec code on data. Those who think altivec is out because Apple build x86: this year we might see the first cell-processor machines. And vector units are built into almost all platforms today.