Raptor Computing Systems Community Forums (BETA)

General OpenPOWER Hardware => General CPU Discussion => Topic started by: bobpaul on February 21, 2025, 11:46:58 am

Title: Performance of HPT vs Radix?
Post by: bobpaul on February 21, 2025, 11:46:58 am: I know the Radix MMU is one of the newewer features on the Power9 series vs older hardware. And I know that which MMU you use impacts KVM modules are available when running qemu. (https://qemu.readthedocs.io/en/v9.2.0/system/ppc/pseries.html#modules-support) It's also my understanding that radix trees are commonly used for the page table on other architectures.

But I've been struggling to find much information on the benefits/tradeoffs. Has anyone posted benchmarks demonstrating the differences? What would be the worst case scenario for HPT? (I assume memory is very full, as that could result in more hash collisions). Are there scenarios where HPT would theoretically perform better than Radix?
Title: Re: Performance of HPT vs Radix?
Post by: ClassicHasClass on February 21, 2025, 06:37:52 pm: This may be helpful: https://www.researchgate.net/publication/325937212_IBM_POWER9_system_software

"The HPT has the advantage that a translation can be performed (i.e., a translation lookaside buffer [TLB] miss can be serviced) by reading up to two cache lines from memory. This characteristic should enable the HPT to provide good performance for applications with very large memory footprints and low locality of references (i.e., with essentially random or quasi-random access patterns).

"The disadvantage of the HPT structure is that it does not cache well. The hashing algorithm used in the POWER Memory Management Unit (MMU) tends to put each page table entry (PTE) for a process into a separate cache line. Thus, most TLB misses cause a cache miss and need to read main memory, particularly in processes with large working sets. ...

"Because radix page tables place information about adjacent addresses into adjacent doublewords in memory, a program with high locality of references in its address access pattern will also have high locality of references in the pattern of accesses performed by the MMU to service TLB misses. Thus, the CPU caches work efficiently to cache the page-table entries (PTEs), and so radix tree translation is expected to be more efficient than HPT translation for workloads with high locality of references."

IME, for many workloads, the difference between HPT and radix MMU is imperceptible, and it does have some impact on what you can virtualize. On the other hand, for large working sets typical of IBM midrange and "big iron," radix does have real benefits.
Title: Re: Performance of HPT vs Radix?
Post by: bobpaul on February 24, 2025, 04:19:24 pm: Oh, awesome! I also found this paper which cites the one you shared:
http://www.cs.cmu.edu/~dskarlat/publications/meecpt_hpca23.pdf

I guess I have some reading to do