General OpenPOWER Hardware > General CPU Discussion

Asymmetric CPUs with the Same Core Count

<< < (2/3) > >>

amock:

--- Quote from: ejfluhr on September 20, 2022, 08:15:01 pm ---If you don't want the bonus cache, I am sure someone would trade CPUs with you.  ;-)

--- End quote ---
I'm very happy with my Talos II :D


--- Quote from: ejfluhr on September 20, 2022, 08:30:16 pm ---How do you read the purple and orange plot??

--- End quote ---

The darker purple is fastest going up to light purple and then red, then orange, then yellow.  There's a legend on the right side of the image, but you might have to scroll to see it. 

ejfluhr:
Does that indicate the caching benefit?   E.g. is the darkest purple runs from the local L2, hence is the lowest latency?  Then the light-but-still-solid purple runs from the shared 10MB-L3 so has the next lowest latency.  The red/purple mix is going out to DIMM memory attached to the same die so it has yet greater latency, and finally the orange is DIMM memory on the other die, hence hops across the socket-to-socket link and has the largest latency.

I don't get why some dark-purple boxes are bigger than others, though.  Also why do some have the solid-light purple around them but not all?  It doesn't all line up.

amock:
The darkest purple is a core pair, which shares L2 and L3 cache.  I'm guessing that the lighter purple is neighboring cores, but I don't know enough about the CPU to say for sure.  The graph is a measurement of how long it takes to send a message between cores, so I don't think it would ever go out to RAM on a single CPU, so the other purplish area is just cores in the same CPU that are far away and then the orange is cores on the other CPU.  There's a die picture at https://web.archive.org/web/20190325062541/https://www.ibm.com/developerworks/community/wikis/form/anonymous/api/wiki/61ad9cf2-c6a3-4d2c-b779-61ff0266d32a/page/1cb956e8-4160-4bea-a956-e51490c2b920/attachment/56cea2a9-a574-4fbb-8b2c-675432367250/media/POWER9-VUG.pdf that I'm using to inform my speculation.

The small dark purple boxes just down and right from center are cores that have their own L2 and L3 cache instead of being paired, so that's just the 4 threads for a single core instead of the 8 that the other cores have.  If you ran this on an 4 core or 8 core machine it should have all small boxes.  Also, some of the cores might not have a neighboring core since only 18 of the potentially 24 cores are enabled.  There's also some error in the measurements since it doesn't use a cycle counting mechanism like on x86.

ejfluhr:
> The graph is a measurement of how long it takes to send a message between cores,

Aha, that was a major piece of info I did not grasp.   Your other explanations all coincide nicely now with the picture.  Pretty interesting measurement!

Do you, or anyone, know if anyone has made similar type of measurement on memory latency, including caching effects, from a single core to all memory in the system?

amock:

--- Quote from: ejfluhr on September 30, 2022, 06:11:02 pm ---Do you, or anyone, know if anyone has made similar type of measurement on memory latency, including caching effects, from a single core to all memory in the system?

--- End quote ---

I don't know of any, but I see it done regularly for x86 systems so there might be something out there that's easily adaptable to other systems.  It seems like on x86 many benchmarking tools use
--- Code: ---RDTSC
--- End code ---
and I haven't found an exact equivalent for the Power ISA, but there seems to be a 500MHz counter at least on POWER9 that I might try if I can find some simple code.  It's something I've thought about but haven't gotten around to yet.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version