GIT-CERCS-07-09
Dong Hyuk Woo, Joshua B. Fryman, Allan D. Knies, Marsha Eng, Hsien-Hsin S. Lee,
POD: A Parallel-On-Die Architecture
As power constraints, complexity and design verification
cost make it difficult to improve single-stream performance, parallel computing paradigm is taking a place amongst mainstream high-volume architectures. Most
current commercial designs focus on MIMD-style CMPs built with rather
complex single cores. While such designs provide a degree of generality, they
may not be the most efficient way to build processors for applications with
inherently scalable parallelism. These designs have been proven to work well
for certain classes of applications such as transaction processing, but they
have driven the development of new languages and complex architectural
features.
Instead of building MIMD-CMPs for all workloads, we propose an alternative
parallel on-die many-core architecture called POD based on a large SIMD PE
array. POD helps to address the key challenges of on-chip communication
bandwidth, area limitations, and energy consumed by routers by factoring out
features necessary for MIMD machines and focusing on architectures that match
many scalable workloads. In this paper, we evaluate and quantify the
advantages of the POD architecture based its ISA on a commercially relevant CISC architecture and show that it can be as efficient as more specialized array
processors based on one-off ISAs. Our single-chip POD is capable of
best-in-class scalar performance up to 1.5 TFLOPS of single-precision
floating-point arithmetic. Our experimental results show that in some application domains, our architecture can achieve nearly linear speedup on a large number of SIMD PEs, and this speedup is much bigger than the maximum speedup that MIMD-CMPs on the same die size can achieve. Furthermore, owing to synchronized computation and communication, it shows that POD can efficiently suppress energy consumption on the novel communication method in our interconnection network.