hardware accelerators

Versa: A Dataflow-Centric Multiprocessor with 36 Systolic ARM Cortex-M4F Cores and a Reconfigurable Crossbar-Memory Hierarchy in 28nm

We present Versa, an energy-efficient processor with 36 systolic ARM Cortex-M4F cores and a runtime-reconfigurable memory hierarchy.

Efficient Management of Scratch-Pad Memories in Deep Learning Accelerators

We propose a compiler extension to efficiently manage the scratch-pad memories in modern deep learning accelerators.

Transmuter: Bridging the Efficiency Gap using Memory and Dataflow Reconfiguration

We propose a reconfigurable accelerator for parallel workloads called Transmuter with a software stack called TransPy.

Sparse-TPU: Adapting Systolic Arrays for Sparse Matrices

We develop a hardware-software co-designed solution to accelerate sparse matrix-vector multiplication on a TPU-like systolic array architecture.

A 7.3 M Output Non-Zeros/J, 11.7 M Output Non-Zeros/GB Reconfigurable Sparse Matrix-Matrix Multiplication Accelerator

Invited journal article detailing our sparse Matrix-Matrix Multiplication accelerator that executes an outer-product based algorithm and uses cache-scratchpad reconfiguration to deliver an order of magnitude better energy- and bandwidth- efficiency compared to GPPs.

A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

We prototype a Sparse Matrix-Matrix Multiplication accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy in 40 nm CMOS that offers significant energy- and bandwidth-efficiency improvements over state-of-the-art CPUs and GPUs.

OuterSPACE: An Outer Product based Sparse Matrix Multiplication Accelerator

We architect a novel hardware-software codesigned accelerator for high-performance, energy efficient sparse matrix multiplication targeting graph analytics and scientific computation.