AMD Milan

Both Mangi and Agate have nodes with two AMD processors with 64 cores each. Applications running on standard Agate CPU nodes have a pool of up to 512 GB of memory and 128 cores available. You can run applications on these nodes without understanding the details, but it can be beneficial to understand their architecture when trying to optimize performance.

 



Each AMD 7702 or 7763 processor has 8 chiplets that each have 8 processor cores. Agate has AMD 7763 processors where all eight CPU cores on the chiplet share a single 32MB L3 cache in the chiplet. These are the CPUs on all of the Agate nodes.

The slightly older Mangi nodes have AMD 7702 processors which has two groups of four processors that share a 16MB L3 cache. On both processors, each core has a 512 KB L2 cache.
 


 



When memory is reserved for an application, it is generally memory in the same NUMA zone as the processors. However, this is not always the case. If you request 8 cores and 300GB of memory (on a 512GB node), then the memory will be split across multiple NUMA zones. The latency to some memory addresses will be higher than others. As a very rough guide, the table below gives you some idea of the expected latency for a processor to reach memory in different locations.
 




Memory

Approximate Latency (CPU cycles)

L1 cache

4

L2 cache

12

L3 cache

40

Main memory (same NUMA Zone)

120

Main memory (different NUMA Zone)

240


 

If you need predictable timings for your application, you will need to reserve an entire node. Without reserving an entire node (128 cores), your application will be running alongside other applications that share the same memory bandwidth.

Similarly, you can expect additional variability in the execution time of your code if you request fewer than 8 cores. On Agate, jobs requesting fewer than 8 cores will share a chiplet and an L3 cache with other applications.