Eager Writeback a Technique for Improving Bandwidth Utilization
Hsien-Hsin Lee, Gary Tyson, Matt Farrens
A Static Filter for Reducing Prefetch Traffic
Viji Srinivasan, Gary S. Tyson Edward S. Davidson
Allocation By Conflict A Simple Effective Cache Management
Scheme
Edward S. Tam, Gary S. Tyson, and Edward S. Davidson
Eager Writeback a Technique for Improving Bandwidth Utilization
Hsien-Hsin Lee, Gary Tyson, Matt Farrens
University of Michigan Technical Report
CSE-TR-399-99
(Full .pdf Report)
ABSTRACT: Modern high-performance processors utilize multi-level cache
structures to help tolerate the increasing latency (measured in processor
cycles) of main memory. These caches employ either a writeback or a write-through
strategy to deal with store operations.
Write-through caches propagate data to more distant memory levels at
the time each store occurs, which requires a very large bandwidth between
the memory hierarchy levels. Writeback caches can significantly reduce
the bandwidth requirements between caches and memory by marking cache lines
as dirty when stores are processed and writing those lines to the memory
system only when that dirty line is evicted. This approach works well for
many applications (e.g. SPEC95), but for graphics applications that experience
significant numbers of cache misses due to streaming data, writeback cache
designs can degrade overall system performance by clustering bus activity
when dirty lines contend with data being fetched into the cache. In this
paper we present a new technique called Eager Writeback, which re-distributes
and balances traffic by writing dirty cache lines to memory prior to their
eviction. This reduces the likelihood of writing dirty lines that impede
the loading of cache miss data. Eager Writeback can be viewed as a compromise
between write-through and writeback policies, in which dirty lines are
written later than write-through, but prior to writeback. We will show
that this approach can reduce the large number of writes seen in a write-through
design, while avoiding the performance degradation caused by clustering
bus traffic of a writeback approach.
A Static Filter for Reducing Prefetch Traffic
Viji Srinivasan, Gary S. Tyson Edward S. Davidson
University of Michigan Technical Report
CSE-TR-400-99
(Full .pdf Report)
ABSTRACT: The growing difference between processor and main memory cycle
time necessitates the use of more aggressive techniques to reduce or hide
main memory access latency. Prefetching data into higher speed memories
is one such technique. However, speculative
prefetching can significantly increase memory traffic. We present a
new technique, called Static Filtering (SF), to reduce the traffic generated
by a given hardware prefetching scheme while preserving its reduced miss
rate. SF uses profiling to select which load instructions should be marked
"enabled" to do data prefetching. This is done by identifying which load
instructions generate data references that are useful prefetch triggers.
SF enables the hardware prefetch mechanism only for the set of references
made by "enabled" loads. Our results from applying SF to two well-known
hardware prefetching techniques, Next Sequential Prefetching (NSP) and
Shadow Directory Prefetching (SDP), shows that SF preserves the decrease
in misses that they achieve and reduces the prefetch traffic by 50 to 60%
for NSP and by 64 to 74% for SDP. In addition, timing analysis reveals
that when finite memory bandwidth is a limiting factor, applying SF does
in fact increase the speedup obtained by a baseline hardware prefetching
technique. The other major contribution of this paper is a complete taxonomy
which classifies individual prefetches in terms of the additional traffic
they generate and the resulting reduction (or increase) in misses. This
taxonomy provides a formal method for classifying prefetches by their usefulness.
A histogram of the prefetches by category provides a new basis for comparing
prefetch techniques.
Allocation By Conflict A Simple Effective Cache Management Scheme
Edward S. Tam, Gary S. Tyson, and Edward S. Davidson
University of Michigan Technical Report
CSE-TR-401-99
(Full .pdf Report)
ABSTRACT: Many schemes have been proposed that incorporate an auxiliary buffer to improve theperformance of a given size cache. One of the most thoroughly evaluated of these schemes, Victim caching, aims to reduce the impact of conflict misses in direct-mappedcaches. While Victim has shown large performance benefits, its competitive advantage is limited to direct-mapped caches, whereas today's caches are increasingly associative. Fur-thermore, it requires a costly data path for swaps and saves between the cache and the buffer.Several other schemes attempt to obtain the performance improvements of Victim, but across a wide range of associativities and without the costly data path for swaps and saves.While these schemes have been shown to perform well overall, their performance still lags that of Victim when the main cache is direct-mapped. Furthermore, they also requirecostly hardware support, but in the form of history tables for maintaining allocation decision information.This paper introduces a new cache management scheme, Allocation By Conflict (ABC), which generally outperforms both Victim and the history-based schemes. Further-more, ABC has the lowest hardware requirements of any proposed scheme -- only a single additional bit per block in the main cache is required to maintain the informationrequired by the allocation decision process, and no swap-save data path is needed.