Recently, there have been many proposals using combinations of cc-NUMA, COMA, and S-COMA architectures for future scalable multiprocessor systems. The general goal is to reduce the hardware complexity of COMA while to maximize the local memory access rate and the effective physical memory size of the entire system. In this proposed project, we will investigate and evaluate different architecture alternatives. In our evaluation, we will consider a mid-range (e.g. 64-way) high-performance highly-scalable system as the basis. Judging from the current technology trend, we will also consider each processor die with large (32 Megabytes) DRAM. Using the trace-based and the execution-based simulation techniques, a complete evaluation will be carried out to assess various design tradeoffs.