Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems

12:10 - 12:30
Hardware Specialization: Estimating Monte Carlo Cross-Section Lookup Kernel Performance and Area

Kazutomo Yoshii, John R. Tramm, Bryce Allen, Andrew Siegel, Pete Beckman
Argonne National Laboratory Lemont, IL, USA

Tomohiro Ueno, Kentaro Sano
RIKEN Center for Computational Science, Kobe, Hyougo, Japan

Hardware specialization is one of the promising directions in the post-Moore era. It is imperative to understand how hardware specialization paradigms can benefit HPC. An essential question revolves around estimating the theoretical performance of an optimally specialized architecture without requiring extensive hardware development expertise and efforts. Focusing on the Monte Carlo cross-section lookup kernel, known for its notably low resource utilization, we develop a workflow to simulate a specialized architecture’s timing and estimate resource usage to answer these questions, leveraging open-source hardware tools. We implement building blocks of the kernel pipeline in the Chisel construction language and generate Verilog codes for resource estimation. Our late-breaking results show that the kernel latency is 46 cycles per lookup while the optimized CPU code takes 680 cycles, and a potential 15k pipeline copies within a 698𝑚𝑚2 die, reflective of the Intel Xeon Platinum 8180 dimensions.

14th IEEE International Workshop on

Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems

held in conjunction with SC23: The International Conference for High Performance Computing, Networking, Storage and Analysis