Subhankar Pal

I am a fourth-year PhD student at the University of Michigan, Ann Arbor. I work on the architecture and design of hardware accelerators in the Circuits and Architecture Design Research (CADRe) group within the Computer Science and Engineering Department.

I am advised by Prof. Ronald Dreslinski Jr. I've done internships at NVIDIA, AMD Research and IBM Research in the past and I've worked full-time at NVIDIA for two years before I joined graduate school. I completed my Bachelor's degree in Electrical and Electronics Engineering from BITS Pilani, Hyderabad Campus, India.

Email  /  CV  /  Google Scholar  /  LinkedIn  /  Twitter


I'm interested in developing novel hardware for specialized applications, such as graph processing, scientific computing and virtual reality. I have a broad experience in various layers of the computing stack, including operating systems, compilers, computer architecture, VLSI design, analog electronics and electronic devices.

Much of my current research is about creating reconfigurable hardware that balances programmability and specialization.

R2D3: A Reliability Engine for 3D Parallel Systems
Javad Bagherzadeh, Aporva Amarnath, Jielun Tan, Subhankar Pal, Ronald Dreslinski
Design Automation Conference (DAC), 2020 (to appear)
Paper / Slides / BibTeX

We propose a holistic reliability solution for parallel 3D architectures that provides concurrent single-replay detection and diagnosis, fault-mitigating repair and aging-aware lifetime management.

A 7.3 M Output Non-Zeros/J, 11.7 M Output Non-Zeros/GB Reconfigurable Sparse Matrix-Matrix Multiplication Accelerator
Dong-hyeon Park, Subhankar Pal, Siying Feng, Paul Gao, Jielun Tan, Austin Rovinski, Shaolin Xie, Chun Zhao, Aporva Amarnath, Timothy Wesley, Jonathan Beaumont, Kuan-Yu Chen, Chaitali Chakrabarti, Michael Taylor, Trevor Mudge, David Blaauw, Hun-Seok Kim, Ronald Dreslinski
Journal of Solid-State Circuits (JSSC), 2019
Paper / BibTeX

Invited journal article detailing our sparse Matrix-Matrix Multiplication accelerator that executes an outer-product based algorithm and uses cache-scratchpad reconfiguration to deliver an order of magnitude better energy- and bandwidth- efficiency compared to GPPs.

A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm
Subhankar Pal, Dong-hyeon Park, Siying Feng, Paul Gao, Jielun Tan, Austin Rovinski, Shaolin Xie, Chun Zhao, Aporva Amarnath, Timothy Wesley, Jonathan Beaumont, Kuan-Yu Chen, Chaitali Chakrabarti, Michael Taylor, Trevor Mudge, David Blaauw, Hun-Seok Kim, Ronald Dreslinski
Symposia on VLSI Technology and Circuits (VLSI), 2019
Paper / Slides / BibTeX

We prototyped a Sparse Matrix-Matrix Multiplication accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy in 40 nm CMOS that offers significant energy- and bandwidth-efficiency improvements over state-of-the-art CPUs and GPUs.

Parallelism Analysis of Prominent Desktop Applications: An 18-Year Perspective
Siying Feng, Subhankar Pal, Yichen Yang, Ronald Dreslinski
International Symposium on Performance Analysis of Systems and Software (ISPASS), 2019
Paper / Slides / BibTeX

We performed extensive analysis of the parallelism exploited by modern software on a state-of-the-art desktop machine and compared against analyses from 2000 and 2010.

OuterSPACE: An Outer Product based Sparse Matrix Multiplication Accelerator
Subhankar Pal, Jonathan Beaumont, Dong-hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, Ronald Dreslinski
International Symposium on High Performance Computer Architecture (HPCA), 2018
Paper / Slides / BibTeX

We architected a novel hardware-software codesigned accelerator for high-performance, energy efficient sparse matrix multiplication targeting graph analytics and scientific computation.

A Carbon Nanotube Transistor based RISC-V Processor using Pass Transistor Logic
Aporva Amarnath, Siying Feng, Subhankar Pal, Tutu Ajayi, Austin Rovinski, Ronald Dreslinski
International Symposium on Low Power Electronics and Design (ISLPED), 2017
Paper / Slides / BibTeX

We explored various architectural design choices using CNTFET-based pass transistor logic and create an energy-efficient RISC-V processor using PTL for critical path components, demonstrating a win in terms of energy-delay product over traditional Silicon-CMOS and Silicion Pass Transistor Logic.

A New Design of an n-bit Reversible Arithmetic Logic Unit
Subhankar Pal, Chetan Kumar Vudadha, P. Sai Phaneendra, V. Sreehari, Srinivas Mandalika
International Symposium on Electronic system Design (ISED), 2014
Paper / Poster / BibTeX

We designed a low-cost quantum ripple-carry adder and enhance that into an ALU using a combination of the NCV and NCV-|v1> quantum gate libraries, which compromises delay in order to improve cost in terms of the number of quantum gates.


IBM Research, Yorktown Heights, USA
May 2019 - Aug 2019

I worked as a research intern in Dr. Viji Srinivasan's System Software group on three different projects. The first involved adding feedback loops for DEEPTOOLS, the compiler runtime for the RAPID DNN accelerator. The second was a unified framework for scratchpad management in DL accelerators. The third involved exploration of different algorithms for selective quantization of mixed-precision neural networks.


AMD Research, Boxborough, USA
May 2017 - Sep 2017

I worked as a summer co-op at AMD's Boston Design Center with Dr. John Kalamatianos on the Path-Forward Project for power-efficient acceleration of exascale workloads on CPUs. I focused on improving the performance and energy efficiency of the Branch Target Buffer in the Instruction Fetch stage.


NVIDIA, Bangalore, India
Jan 2014 - Jul 2016

My work rotated between full-chip verification of next-generation GPUs in the full-chip environment and bringing up the silicon after it comes back from the fab. I also independently developed and maintained GUI debug utilities for debugging the PCI-Express interface of the GPU.

Course Projects

A MIPS R10K-based Superscalar Out-of-Order Processor
Subhankar Pal, Kush Goliya, Harsha Valsaraju, Sean McLaughlin, Sep 2016 - Dec 2016

We implemented a MIPS R10K-style, 2-way superscalar, 6-stage, out-of-order processor based on the Alpha ISA using SystemVerilog.


Automatic Identification of Colored Bacteria in Agar
Subhankar Pal, Shivani Gupta, Padmavathi Yenmandra, Suman Kapur, Aug 2013 - Dec 2013

I worked towards the development of an Android app to detect colored bacterial colonies in agar through statistical image processing techniques. This was to test the effect of antibiotics on bacteria causing UTI in humans.


EECS 570: Parallel Computer Architecture - Winter 2018 (GSI)
Jan 2018 - Apr 2018

I was the Graduate Student Instructor (GSI) for EECS 570, assisting Prof. Satish Narayanasamy. My responsibilities included formulating exam/assignment/project materials, assisting students with a semester-long research project, delivering a few of the lectures, maintaining course/Canvas webpage and conducting regular office hours.


Mentorship in ASIC Verification, NVIDIA Corporation
Feb 2014 - Jul 2014

I mentored Anand Thati, an intern in the GPU ASIC Bring-Up team at NVIDIA. My responsibilities included introducing the Synopsys Verdi tool and using it for functional debug of the GPU, assisting with Python/Tkinter coding and Perl/Shell scripting.

Flag Counter
Last updated: March 22, 2020 | website template credits