Increasing Data Reuse of Sparse Algebra Codes on Simultaneous Multithreading Architectures

In this paper the problem of the locality of sparse algebra codes on simultaneous multithreading (SMT) architectures is studied. In these kind of architectures many hardware structures are dynamically shared among the running threads. This puts a lot of stress on the memory hierarchy, and a poor locality, both inter-thread and intra-thread, may become a major bottleneck in the performance of a code. This behavior is even more pronounced when the code is irregular, which is the case of sparse matrix ones. Therefore, techniques that increase the locality of irregular codes on SMT architectures are important to achieve high performance. This paper proposes a data reordering technique specially tuned for these kind of architectures and codes. It is based on a locality model developed by the authors in previous works. The technique has been tested, first, using a simulator of a SMT architecture, and subsequently, on a real architecture as Intel's Hyper-Threading. Important reductions in the number of cache misses have been achieved, even when the number of running threads grows. When applying the locality improvement technique, we also decrease the total execution time and improve the scalability of the code.

Increasing Data Reuse of Sparse Algebra Codes on Simultaneous Multithreading Architectures Articles