About Me

I am a Ph.D. candidate in Computer Science and Engineering at the University of Michigan where I am advised by Prof. Ron Dreslinski. Priorly, I received my bachelor’s degree from Addis Ababa Institute of Technology.


My research spans the area of computer architecture, particularly focusing on accelerating data-intensive applications. I design hardware and software optimizations to improve the performance and energy efficiency of data-intensive workloads on conventional and emerging architectures like Processing-in-Memory (PIM). One of the challenges in the era of big data, is the increasing gap between compute performance and memory bandwidth, which has been partially addressed by in/near memory computing architectures. However, harnessing the high memory bandwidth in these architectures is hampered by the interconnect, which incurs costly data movement. To address this, my thesis proposes custom hardware (memory subsystem including interconnect) and software optimizations to improve the performance and energy efficiency of massively parallel processors.

In my recent work [PACT'22], I have proposed fine-grained inter-GPU data movement and novel caching techniques to improve the performance and scalability of multi-GPU workloads. I have also explored ways to reduce excess data movement in PIM-based graph execution through a processing-in-network solution [ISLPED'19] and multicasting techniques [DATE'20].




[1] Leul Belayneh, Haojie Ye, Kuan-Yu Chen, David Blaauw, Trevor Mudge, Ronald Dreslinski, Nishil Talati. Locality-aware Optimizations for Improving Remote Memory Latency in Multi-GPU Systems. International Conference on Parallel Architectures and Compilation Techniques (PACT), Chicago, USA, 2022. [PDF]


[2] Nishil Talati, Haojie Ye, Yichen Yang, Leul Belayneh, Kuan-Yu Chen, David Blaauw, Trevor Mudge, Ronald Dreslinski. NDMiner: Accelerating Graph Pattern Mining Using Near Data Processing. International Symposium on Computer Architecture (ISCA), New York, USA, 2022. [PDF]


[3] Leul Belayneh, Valeria Bertacco. GraphVine: Exploiting Multicast for Scalable Graph Analytics. Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 2020. [PDF]


[4] Leul Belayneh, Abraham Addisie, Valeria Bertacco. MessageFusion: On-path Message Coalescing for Energy Efficient and Scalable Graph Analytics. International Symposium on Low Power Electronics and Design (ISLPED), Lausanne, Switzerland, 2019. [PDF]


[5] Leul Belayneh, Fitsum Assamnew Andargie, Valeria Bertacco. Archipelago: Architectural Support for Graph Analytics on GPUs. ACM-SRC at International Conference on Parallel Architectures and Compilation Techniques (PACT), 2020. [PDF]

Useful Links


Ph.D. Computer Science and Engineering

University of Michigan, Ann Arbor, MI

Sept 2018 - Present

B.Sc. Electrical and Computer Engineering

Addis Ababa University, Addis Ababa, Ethiopia

Sept 2012 - May 2017


  • In-depth Characterization and Architectural Support for Irregular Workloads on GPUs

    In this work, sources of inefficiencies in GPUs are identified, analyzed, and addressed via combined software-hardware optimizations. Specifically, I designed and implemented architectural enhancements to the memory subsystem of GPUs so as to efficiently utilize their enormous memory bandwidth and computing power.

  • Exploiting Power-Law for Graph Prefetching

    Irregular workloads, particularly graph analytics, benefits less from conventional prefetching mechanisms. In this work, we applied software-based prefetching that targets vertices with significant outgoing edges in power-law graphs (i.e. top 20%). LLVM-based selective insertion of prefetches minimizes unwanted prefetching,thus alleviating cache pollution.

  • Multicast for Scalable Graph Analytics

    In most graph workloads, a source vertex sends out similar vertex-update messages to its neighboring vertices. GraphVine [DATE’20] exploits multicasting to combine similar messages into a multicast packet which alleviates network traffic.

  • Processing in Network Solution for Scalable Graph Analytics

    Commutative and associative reduction operations in graph analytics, allows distributed computation via compute-capable routers. Hence, MessageFusion [ISLPED’19] coalesces vertex-update messages traversing to a same destination so as to reduce overall network traffic.

  • Software-based teaching aid for Signal Processing and Digital Communication

    Delivered an easy-to-use software tool for education support in IoT at Addis Ababa Institute of Technology.


2260 Hayward St, MI 48109, USA