You could be suffering from memory allocation costs, not so much the
operations themselves. The rmtxop
program uses doubles and 3 components per matrix entry, so that's
(3x2048x2305 x 8 bytes) or 108 Mbytes
for each of your matrices. When you multiply one such matrix by a sky
vector, you only have the additional
memory needed by the vector (54 KBytes). When you add matrices, rmtxop
keeps two of them in memory
at a time, or 216 MBytes of memory. That's not a lot for most PCs these
days, but the allocation and freeing
of that much space may take some time if malloc is not efficient.
The problem here is not the amount of memory, but of the CPU's access to it.
When the CPU is accessing the arrays, the data is stored in a hierarchy of
caches. For a modern Intel Core i7, for example, there are typically four
L2 caches of 256 KB each and a slower L3 cache of 8 MB that is shared by the
If the arrays can be stored in the L2 cache, the processor can usually run
at full speed. If not, then the CPU will typically have to wait while the
data is retrieved from the slower L3 cache.
The caches are on-chip. If the arrays exceed the L3 cache capacity, then the
data will need to be retrieved from the much slower main memory and
transferred over the memory bus. With 216 MB of array data to contend with,
this is most likely the culprit.
Performance optimization typically involves arranging the array data in
memory such that it can be loaded in cache lines, organizing the array
stride, splitting the arrays into subarrays for multithreaded processing,
and so on. However, these are mostly processor-family specific. (For Intel
CPUs, SSE and AVX instructions are also available to improve parallelism.
Optimizing compilers can help here, but hand coding may be needed for
optimal performance on specific processors.)
If the arrays are stored in ASCII rather than binary, you will typically see
a performance hit of several hundred times as the CPU spends most of its
time parsing the strings into floating-point data.
Ian Ashdown, P. Eng. (Ret.), FIES
SunTracker Technologies Litd.