# **CCICheck:** Using µhb Graphs to Verify the Coherence-Consistency Interface

Yatin A. Manerkar, Daniel Lustig, Michael Pellauer\*, and Margaret Martonosi

Princeton University

\*NVIDIA

#### MICRO-48



At a high level:

- Coherence Protocols: Propagation of writes to other cores
- Consistency Models: Ordering rules for visibility of reads and writes









#### **Coherence and consistency often interwoven**

µarch. Level



# Coherence VerifiersCoIgnore consistency<br/>even when<br/>protocol affects<br/>consistency!Co

#### **Consistency Verifiers**

Assume abstract coherence instead of protocol in use!

Arch. Level

**Coherence and consistency often interwoven** 

µarch. Level





**Coherence and consistency often interwoven** 

µarch. Level





- 1. Invalidation before use
  - Repeated inv before use  $\rightarrow$  livelock [Kubiatowicz et al. ASPLOS 1992]



- 1. Invalidation before use
  - Repeated inv before use → livelock [Kubiatowicz et al. ASPLOS 1992]
- 2. <u>Livelock avoidance</u>: allow destination core to perform one operation on data when it arrives, even if already invalidated [Sorin et al. Primer]
  - Does **not** break coherence
  - Sometimes intentionally returns stale data



- 1. Invalidation before use
  - Repeated inv before use → livelock [Kubiatowicz et al. ASPLOS 1992]
- 2. <u>Livelock avoidance</u>: allow destination core to perform one operation on data when it arrives, even if already invalidated [Sorin et al. Primer]
  - Does **not** break coherence
  - Sometimes intentionally returns stale data
- 3. Prefetching



- 1. Invalidation before use
  - Repeated inv before use → livelock [Kubiatowicz et al. ASPLOS 1992]

Individual Opt.  $\rightarrow$  No violation Combination of Opts.  $\rightarrow$  Violation!

- Does not break coherence
- Sometimes intentionally returns stale data
- 3. Prefetching



• Consider **mp** with the livelock-avoidance mechanism:



Core 0

x: Shared y: Modified

[x] ← 1 [y] ← 1 <u>Core 1</u>

x: Invalid y: Invalid

 $r1 \leftarrow [y]$  $r2 \leftarrow [x]$ 











































































### Our Work: CCICheck Static CCI-aware consistency verification





### Our Work: CCICheck Static CCI-aware consistency verification







Microarchitectural happensbefore (µhb) graph

### Background: PipeCheck



- Exhaustive enumeration of executions using µhb graphs
- Cyclic graph
   →forbidden by
   µarch
- Acyclic graph
   →allowed by µarch

[Lustig et al. MICRO-47]



33

### Background: PipeCheck



- Exhaustive enumeration of executions using µhb graphs
- Cyclic graph
   →forbidden by
   µarch
- Acyclic graph
   →allowed by µarch

[Lustig et al. MICRO-47]





## Prior techniques cannot model CCI events!



[Lustig et al. MICRO-47]



(i1)

i2)

Completed

Core 0

St v

St  $[x] \leftarrow 1$ 

 $\leftarrow 1$ 

Litmus Test mp

Under TSO: Forbid r1=1, r2=0

Core 1

V

X

(i3) Ld r1  $\leftarrow$ 

(i4) Ld r2  $\leftarrow$ 

#### Modelling CCI Events

Need to model per-cache occupancy

Lazy coherence and partial incoherence (e.g. GPUs)

 Need to model coherence transitions that relate to consistency (e.g. Peekaboo)


# Modelling CCI Events

Need to model per-cache occupancy

Lazy coherence and partial incoherence (e.g. GPUs)

 Need to model coherence transitions that relate to consistency (e.g. Peekaboo)





• 4-tuple:

(cache\_id, address, data\_value, generation\_id)

- cache\_id and generation\_id uniquely identify each cache line
- A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address
- ViCLs start at a ViCL Create event and end at a ViCL Expire event



- 4-tuple:
  (cache\_id) address, data\_value(generation\_id)
- cache\_id and generation\_id uniquely identify each cache line
- A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address
- ViCLs start at a ViCL Create event and end at a ViCL Expire event



- 4-tuple:
  (cache\_id, address, data\_value) generation\_id)
- cache\_id and generation\_id uniquely identify each cache line
- A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address
- ViCLs start at a ViCL Create event and end at a ViCL Expire event



• 4-tuple:

(cache\_id, address, data\_value, generation\_id)

- cache\_id and generation\_id uniquely identify each cache line
- A ViCL 4-tuple maps on to the period of time over which the cache line serves the data value for the address
- ViCLs start at a ViCL Create event and end at a ViCL Expire event











































#### Can model requests, downgrades, etc.





#### Can model requests, downgrades, etc.







| Core 0                          | Core 1                      |  |
|---------------------------------|-----------------------------|--|
| (i1) St $[x] \leftarrow 1$      | (i3) Ld r1 $\leftarrow$ [x] |  |
| (i2) St [x] $\leftarrow 2$      | (i4) Ld r2 $\leftarrow$ [x] |  |
| In TSO: $r1=2$ , $r2=2$ Allowed |                             |  |

Litmus Test co-mp





| Core 0                          | Core 1                      |  |
|---------------------------------|-----------------------------|--|
| (i1) St $[x] \leftarrow 1$      | (i3) Ld r1 $\leftarrow$ [x] |  |
| (i2) St [x] $\leftarrow 2$      | (i4) Ld r2 $\leftarrow$ [x] |  |
| In TSO: $r1=2$ , $r2=2$ Allowed |                             |  |

Litmus Test co-mp





|                                 | -                           |
|---------------------------------|-----------------------------|
| Core 0                          | Core 1                      |
| (i1) St $[x] \leftarrow 1$      | (i3) Ld r1 $\leftarrow$ [x] |
| (i2) St $[x] \leftarrow 2$      | (i4) Ld r2 $\leftarrow$ [x] |
| In TSO: $r1=2$ , $r2=2$ Allowed |                             |

Litmus Test co-mp





|                                 | -                           |
|---------------------------------|-----------------------------|
| Core 0                          | Core 1                      |
| (i1) St [x] $\leftarrow 1$      | (i3) Ld r1 $\leftarrow$ [x] |
| (i2) St [x] $\leftarrow 2$      | (i4) Ld r2 $\leftarrow$ [x] |
| In TSO: $r1=2$ , $r2=2$ Allowed |                             |

Litmus Test co-mp





|                                 | -                           |  |
|---------------------------------|-----------------------------|--|
| Core 0                          | Core 1                      |  |
| (i1) St $[x] \leftarrow 1$      | (i3) Ld r1 $\leftarrow$ [x] |  |
| (i2) St $[x] \leftarrow 2$      | (i4) Ld r2 $\leftarrow$ [x] |  |
| In TSO: $r1=2$ , $r2=2$ Allowed |                             |  |

Litmus Test co-mp





| · · · · · · · · · · · · · · · · · · · |                             |  |
|---------------------------------------|-----------------------------|--|
| Core 0                                | Core 1                      |  |
| (i1) St $[x] \leftarrow 1$            | (i3) Ld r1 $\leftarrow$ [x] |  |
| (i2) St $[x] \leftarrow 2$            | (i4) Ld r2 $\leftarrow$ [x] |  |
| In TSO: $r1=2$ , $r2=2$ Allowed       |                             |  |

Litmus Test co-mp























#### **Path Enumeration**



#### Constraint Satisfaction











#### Constraint Satisfaction




















# **Case Studies and Results**





- Livelock
   prevention
   mechanism
   allows use of
   stale data
- "Peekaboo" edge completes cycle
   > outcome
   forbidden
- Consistency maintained



- Livelock
   prevention
   mechanism
   allows use of
   stale data
- "Peekaboo" edge completes cycle
   > outcome
   forbidden
- Consistency maintained

76



- Livelock
   prevention
   mechanism
   allows use of
   stale data
- "Peekaboo" edge completes cycle
   > outcome
   forbidden
- Consistency maintained



- Livelock
   prevention
   mechanism
   allows use of
   stale data
- "Peekaboo" edge completes cycle
   > outcome
   forbidden
- Consistency maintained



- Livelock
   prevention
   mechanism
   allows use of
   stale data
- "Peekaboo" edge completes cycle
   > outcome
   forbidden
- Consistency maintained

## Partial Incoherence: GPUs



e.g.: **mp** with membar fences [Alglave et al. ASPLOS15] If fence does

not enforce InvCache ordering => no cycle





fences [Alglave et al. ASPLOS15] If fence does not enforce InvCache ordering =>

no cycle







## **Verification Times**



- Runtimes remain reasonable due to intelligent pruning and unsatisfiable constraint detection
- Subsequent research has used SMT solverbased techniques to run most tests in just seconds! [ASPLOS 2016]



## **Verification Times**



- Runtimes remain reasonable due to intelligent pruning and unsatisfiable constraint detection
- Subsequent research has used SMT solverbased techniques to run most tests in just seconds! [ASPLOS 2016]



## Conclusion

- CCI verification is critical to correct operation of complex parallel systems
- **CCICheck:** static CCI-aware microarchitectural consistency verification

- Partial incoherence (GPUs), lazy coherence, and more!

- μhb graphs, ViCLs, and constraint-based enumeration
  - Comprehensive and intuitive  $\mu$ arch modelling
- Allows designers to build correct systems with greater ease and confidence



# **CCICheck:** Using µhb Graphs to Verify the Coherence-Consistency Interface

## Yatin A. Manerkar, Daniel Lustig, Michael Pellauer, and Margaret Martonosi

Code available at https://github.com/ymanerka/ccicheck





- No eager invalidation of sharers, but "InvCache" edges model the invalidation of a core's private cache on an L1 miss
- Thus, TSO is maintained



- No eager invalidation of sharers, but "InvCache" edges model the invalidation of a core's private cache on an L1 miss
- Thus, TSO is maintained

88





- i3 needs a source for its value
- L1 ViCL with same address and data





- i3 needs a source for its value
- L1 ViCL with same address and data







- i3 needs a source for its value
- L1 ViCL with same address and data





- i3 needs a source for its value
- L1 ViCL with same address and data
- => Two possibilities enumerated.





- i3 needs a source for its value
- L1 ViCL with same address and data
- => Two possibilities enumerated.

