vectorization data types

Hi!

I don't know quite understand how changing the struct's to union's would help in vectorizing the code -- you'll have to give me an explicit example, and this is probably one for radiance-dev.

This is the altivec way, I am not sure about other vector processors.

A common approach to integrate altivec in conventional code is to keep using arrays. The problem is that data has to be aligned for altivec. If it isn't, one has to do quite a lot of operations before the actual computation. As on altivec machines, I hace a type "vector", I can enforce alignment of data in an array by the following:

typedef union { float arrayData[4]; vector float vectorData; } myVector;

Now, I can still access elements in the array, but the compiler will align the data, and for altivec, I can use the vector directly without any shifts.

I came across this when I found an old posting from Georg, who proposed that someone could take a look at multmat, which is perfect for altivec. But as mat4 and fvect are not aligned, the overhead is too large to really optimize code without changing data outside those functions.

Finally, I feel I really HAVE to say that I am not a programmer, if, the lousiest c and c++ "scripter" out there... :wink:

CU, Lars.

OK, I see what you mean, now. I haven't profiled the code recently, but relatively little time is spent in the matrix routines. Optimizing a ray tracer is really challenging, because bottlenecks are not easy to isolate. A lot of time is spent in the various material shading routines, which are spread all over the place and not easy to simplify. The only place where you can really focus effort is in the actual octree traversal code, and this doesn't vectorize at all.

-G

···

From: Lars O. Grobe <[email protected]>
Date: February 26, 2006 12:14:15 PM PST

Hi!

I don't know quite understand how changing the struct's to union's would help in vectorizing the code -- you'll have to give me an explicit example, and this is probably one for radiance-dev.

This is the altivec way, I am not sure about other vector processors.

A common approach to integrate altivec in conventional code is to keep using arrays. The problem is that data has to be aligned for altivec. If it isn't, one has to do quite a lot of operations before the actual computation. As on altivec machines, I hace a type "vector", I can enforce alignment of data in an array by the following:

typedef union { float arrayData[4]; vector float vectorData; } myVector;

Now, I can still access elements in the array, but the compiler will align the data, and for altivec, I can use the vector directly without any shifts.

I came across this when I found an old posting from Georg, who proposed that someone could take a look at multmat, which is perfect for altivec. But as mat4 and fvect are not aligned, the overhead is too large to really optimize code without changing data outside those functions.

Finally, I feel I really HAVE to say that I am not a programmer, if, the lousiest c and c++ "scripter" out there... :wink:

CU, Lars

OK, I see what you mean, now. I haven't profiled the code recently,
but relatively little time is spent in the matrix routines.
Optimizing a ray tracer is really challenging, because bottlenecks
are not easy to isolate. A lot of time is spent in the various
material shading routines, which are spread all over the place and
not easy to simplify. The only place where you can really focus
effort is in the actual octree traversal code, and this doesn't
vectorize at all.

I hoped at least for some transformations (instances, oconv), maybe even
color calculation. In fact, every code snippet where the same function is
applied to x,y,z (like fvect[0]=...; fvect[1]=...; fvect[2]=...:wink: should
win, as such calculations could apply to the whole vector at once, maybe
some color calculations as well.

In fact, the growth of a ray tree from a cluster of rays is something I
simply did not consider, as I am really not a programmer, so the idea of
tracing more than one ray at once (which was so pretty thinking about
ambient calculation) was nonsense.

But I will stop now and have a look at the code before I continue to write a
word :wink: Is there a nice schematic overview over the raytracing routines on
the net? I have my "RwR" left in Germany because it was a bit heavy to take
it in the plane ;-)))

CU Lars.

Well, it's a bit out of date, but there's the old document that describes the source tree. Probably not really what you're looking for, but it's all I know about:

  http://radsite.lbl.gov/radiance/refer/srctree.pdf

While I agree there may be some vectorization possibilities offered by the many color and vector operations in the code, I don't know that they would really pay off as the set-up time (overhead) would probably eliminate any savings you got from a single assignment or dot product. Vectorization helps much more when you have longer vectors and arrays, and everything is 3-vectors in Radiance.

If you want to test the idea if short vectors can actually speed up the code, work on the known bottlenecks first, like the ray traversal code in src/rt/raytrace.c. In particular, the raymove() routine. Try vectorizing:

  pos[0] += r->rdir[0]*t;
  pos[1] += r->rdir[1]*t;
  pos[2] += r->rdir[2]*t;

I suspect you will see negligible gains, because (1) the overhead will kill you and (2) the conditional code above this dominates on modern pipelined processors.

-G

···

From: "Lars Grobe" <[email protected]>
Date: February 27, 2006 5:40:24 AM PST

I hoped at least for some transformations (instances, oconv), maybe even
color calculation. In fact, every code snippet where the same function is
applied to x,y,z (like fvect[0]=...; fvect[1]=...; fvect[2]=...:wink: should
win, as such calculations could apply to the whole vector at once, maybe
some color calculations as well.

In fact, the growth of a ray tree from a cluster of rays is something I
simply did not consider, as I am really not a programmer, so the idea of
tracing more than one ray at once (which was so pretty thinking about
ambient calculation) was nonsense.

But I will stop now and have a look at the code before I continue to write a
word :wink: Is there a nice schematic overview over the raytracing routines on
the net? I have my "RwR" left in Germany because it was a bit heavy to take
it in the plane ;-)))

CU Lars.

If you want to test the idea if short vectors can actually speed up the code, work on the known bottlenecks first, like the ray traversal code in src/rt/raytrace.c. In particular, the raymove() routine. Try vectorizing:

  pos[0] += r->rdir[0]*t;
  pos[1] += r->rdir[1]*t;
  pos[2] += r->rdir[2]*t;

I suspect you will see negligible gains, because (1) the overhead will kill you and (2) the conditional code above this dominates on modern pipelined processors.

I think also oconv and the intersection tests (rt/o_*) would benefit, BUT: altivec supports float at most, while vectors (fvect) in radiance usually are doubles as long as I do not define SMLFLT. That means there is really no hope at the moment.

CU Lars.