Chip Multiprocessors (CMPs) have become an industry standard for achieving a higher chip-level IPC (Instruction-Per-Cycle). Recently, Intel's Tara-scale computing project pushed the number of on-die cores to tens or even hundreds. In addition, there are many new memory technologies looming on the horizon that may reshape the memory hierarchy organization in many-core CMPs. Among these new technologies, the evolving high density memory, such as Thyristor_RAM, Ferroelectric-RAM, and Resistive-RAM could potentially be embedded in the CPU die to provide much larger on-chip storages. Key related design issues in future many-core CMPs are an intelligent on-die memory hierarchy organization along with efficient data communication and coherency mechanisms among many cores and storage modules. Furthermore, it is essential to incorporate new memory technologies into future CMPs for more reliable and scalable memory systems.
In this project, our first proposed research topic is to investigate solutions of using large on-die storage as caches and/or an addressable unit. In addition, designing a scalable cache coherence mechanism with large on-die storage is very challenging and opens many new research fronts that will have substantial impact on CMP performance. Our second proposed topic is to integrate new memory technology and its usage as the main memory in future CMPs. It is well-known that main memory built with the new memory technology incurs substantial longer latency. We will study data prefetching techniques to hide the memory latency. To be effective, all prefetching methods must overcome four serious challenges: accuracy, miss coverage, timeliness, and space overhead. Existing prefetching methods are based on two general behaviors of the missing block addresses: regularity and correlation. We will look into solutions in both directions and investigate more suitable solutions for the new memory technologies. In this project, we will evaluate different data prefetching methods using the MARSS whole-system simulation environment.
Related Publications:
1.
1.
Xi Tao,
Qi Zeng, Jih-Kwon Peir, and Shih-Lien Lu, "Small Cache Lookaside Table for Fast
DRAM Cache Access," 35th IEEE
International Performance Computing and Communications Conference, (IPCCC),
Las Vegas Nevada, Dec. 2016.
2.
Xi Tao,
Qi Zeng, Jih-Kwon Peir and Shih-Lien Lu, "Runahead Cache Misses Using Bloom
Filter," 17th International Conference on
Parallel and Distributed Computing, Applications and Technologies (PDCAT),
Guangzhou China, Dec. 2016.
3.
Xi Tao,
Qi Zeng, Jih-Kwon Peir, "Hot Row Identification of DRAM Memory in a Multicore
System," 2016 High Performance Computing
and Cluster Technologies Conference (HPCCT), Chengdu China, Dec. 2016.
4.
Xudong
Shi, Feiqi Su and Jih-Kwon Peir, "Directory Lookaside Table: Enabling Scalable,
Low-Conflict, Many-Core Cache Coherence Directory,"
20th IEEE International Conference on
Parallel and Distributed Systems (ICPADS), Hsinchu Taiwan, Dec. 2014.
6.
Jianmin
Chen, Xi Tao, Zhen Yang, Jih-Kwon Peir, Xiaoyuan Li, Shih-Lien Lu, "Guided
Region-Based GPU Scheduling: Utilizing Multi-thread Parallelism to Hide Memory
Latency", 27th IEEE International Parallel & Distributed Processing Symposium
(IPDPS), Boston MA, May 2013.
7.
Gang
Liu, Jih-Kwon Peir, Victor Lee, "Miss-Correlation Folding: Encoding Per-Block
Miss Correlations in Compressed DRAM for Data Prefetching",
26th IEEE International Parallel &
Distributed Processing Symposium (IPDPS), Shanghai China, May 2012.