GIT-CERCS-08-01
    Sudhakar Yalamanchili, Jeff Young, Jose Duato, Federico Silla,
    A Dynamic, Partitioned Global Address Space Model for High Performance Clusters

    Memory-to-memory latency is a critical performance determinant of scalable computing systems. The use of modern interconnect fabrics tightly coupled to the processor-memory hierarchy such as AMD’s HyperTransport^TM (HT) have the potential to provide the lowest end-to-end transfer latency for systems comprised of tens to thousands of multicore nodes. However, to productively harness this raw capability, it must be exercised in the context of a global system model that defines how the system wide address space is deployed and utilized. Towards this end we advocate and explore the implications and implementation of a Partitioned Global Address Space (PGAS) model for the implementation of scalable cluster systems. A prototype implementation based on HT-Over-Ethernet (HToE) is proposed that is suitable for experimentation and measurement. In particular, we are concerned about the portability of the model and software implementations across future generations of processors with increasing physical address ranges. The paper concludes with the identification of several potential directions for future research.