We develop a hardware-software co-designed solution to accelerate sparse matrix-vector multiplication on a TPU-like systolic array architecture.
Invited journal article detailing our sparse Matrix-Matrix Multiplication accelerator that executes an outer-product based algorithm and uses cache-scratchpad reconfiguration to deliver an order of magnitude better energy- and bandwidth- efficiency compared to GPPs.
We prototype a Sparse Matrix-Matrix Multiplication accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy in 40 nm CMOS that offers significant energy- and bandwidth-efficiency improvements over state-of-the-art CPUs and GPUs.
We architect a novel hardware-software codesigned accelerator for high-performance, energy efficient sparse matrix multiplication targeting graph analytics and scientific computation.