Predictive Multiprocessor Caching Techniques Based on Cache Interference
and Working Set Change
Investigators: Jih-Kwon Peir
Sponsor: NSF/EIA
Abstract:
High-performance computer servers based on shared-memory
multiprocessing technology continue receiving great attention
due to the demand from the blooming internet market.
Parallel programs running on cache-coherent, shared-memory
multiprocessor environments, like transaction processing workloads,
incur performance penalties due to cache interference caused by data
sharing. The interference causes cache lines to be involuntarily
relinquished before required by an LRU replacement policy. This
problem is especially serious for modified lines which can account
for a large percentage of the total cache misses but also incur higher
miss penalty. As the number of processors increase, the cache interference
can become dominant and hinder any performance improvement.
Therefore, the main objective of this project is to investigate and
evaluate innovative hardware-based approaches to reduce the sharing
misses in multiprocessor caches. The fundamental idea is to record
the lines that a cache has given up involuntarily. Such lines become
potential prefetching targets because they are likely to be used in
the near future due to locality of references.
There are two general approaches to prefetch
the early-invalidated lines. The first and intuitive way is to take
advantages of normal coherence transactions. This situation may
encounter when the modified copy of a line is transferred from the
owner's cache to the requester in response to a read miss, or
when the line is evicted from the owner's cache.
The second and more aggressive approach is to give up the ownership
earlier by predicting the last-modify of the line by the owner.
This early given-up line can then be selectively broadcasted to the
processors where the line has been invalidated recently.
Papers and Presentations:
-
L. Peng, J-K. Peir, and K. Lai, A New Address-Free Memory Hierarchy Layer for Zero-Cycle
Load, Journal of Instruction-Level Parallelism, Vol. 6, Sep. 2004.
-
L. Peng, J-K. Peir, and K. Lai,
Signature Buffer: Bridging Performance Gap between Registers
and Caches,
10th Int'l Symp. on High Performance Computer Architecture,
(HPCA-10), Feb. 2004.
-
L. Peng, J-K. Peir, Q. Ma, and K. Lai,
Address-Free Memory Access Based on Program Syntax Correlation of
Loads and Stores,
IEEE Transactions on VLSI Systems,
Vol. 11(3), June 2003.
-
J-K. Peir, S. Lai, S. LU, J. Stark, and K. Lai,
Bloom Filtering Cache Miss for Accurate Data Speculation and
Prefetching,
Int'l Conf. on Supercomputing,
New York, NY, June 2002.
-
S, Lai, S. Lu, K. Lai, and J-K. Peir,
Ditto Processor,
Int'l Conf. on Dependable Systems and Networks,
Washington DC, June 2002.
-
B. Chung, J. Zhang, J-K. Peir, S. Lai, K. Lai,
Direct Load: Dependence-Linked Dataflow Resolution
of Load Address and Cache Coordinate,
34th Int'l Symp. on Microarchitecture,
Austin, TX, Nov. 2001.
-
Q. Ma, J-K. Peir, L. Peng, and K. Lai,
Symbolic Cache: Fast Memory Access Based on Program
Syntax Correlation of Loads and Stores,
Best Paper Award ,
IEEE 2001 Int'l Conf. on Computer Design,
Austin, TX, Sep. 2001.
-
B. Chung, Y. Lee, J.-K. Peir, and K. Lai,
Two-Phase Write-Posting on Symmetric Multiprocessors,
2001 Int'l Conf. on Parallel and Distributed Processing
Techniques and Applications, June 2001.
-
J-K. Peir, J. Zhang, S. Zhang, S. Robinson, K. Lai, and W. Wang,
"Predictive Multiprocessor Caching: Read/Write Snarfing, Preown,
and Selective Write Broadcast,"
9th Workshop on Scalable Shared Memory Multiprocessors,,
Vancouver, CA, June 2000.
-
J-K. Peir, W. W. Hsu, H. Young, and S. Ong,
"Improving Cache Performance with Full-Map Block Directory,"
Journal of System Architecture, Vol.46(2000), pp. 439-454.