Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - ejfluhr

Pages: 1 [2] 3
16
Firmware / Re: Messing with WOF Tables
« on: October 07, 2022, 01:55:37 pm »
Too bad this isn't publicly available:
   https://research.ibm.com/publications/deterministic-frequency-boost-and-voltage-enhancements-on-the-power10tm-processor

Abstract
Digital droop sensors with core throttling mitigate microprocessor voltage droops and enable a voltage control loop (undervolting) to offset loadline uplift plus noise effects, protecting reliability VDDMAX.  These combine with a runtime algorithm for Workload Optimized Frequency (WOF) that deterministically maximizes core frequency.  The combined effect is demonstrated across a range of workloads including SPECTM , and provides up to a 15% frequency boost and a 10% reduction in core voltage.


17
Firmware / Re: Messing with WOF Tables
« on: October 07, 2022, 01:19:54 pm »
Can you read out the processor frequencies and tell if it is boosting properly?    Do you see ~160W when at idle?   What about when running max threads off some heavy workload (e.g. Mersenne primes is a good one)?

18
General CPU Discussion / Re: Asymmetric CPUs with the Same Core Count
« on: October 07, 2022, 01:05:39 pm »
500MHz affords 1/500Mhz = 2ns resolution of latency?   Sounds cool!   What is same-die typical memory latency, ~70ns??   Local L3 would have to run on the order of ~10ns??

There could be sensitivity to core frequency, presuming the L3 cache and some of the internal transfer network is pipelined.  E.g. running at 3.8GHz turbo will give a better answer than 2.8GHz base.

19
Talos II / Re: Temperatures and rotational speed of fans
« on: October 07, 2022, 12:59:25 pm »
The processor OCC firmware can call for fan speed increases if on-die Tj rise.

Where does that "Temperature Pcie" come from?   Presuming it is from PCIe card(s), it would seem sensible that the BMC should monitor and adjust accordingly.   Hard to imagine there is such a glaring bug in the BMC firmware, but worse has happened I guess.

Maybe Fan5 is supposed to cool the PCIe cards, but isn't working?


20
General CPU Discussion / Re: Asymmetric CPUs with the Same Core Count
« on: September 30, 2022, 06:11:02 pm »
> The graph is a measurement of how long it takes to send a message between cores,

Aha, that was a major piece of info I did not grasp.   Your other explanations all coincide nicely now with the picture.  Pretty interesting measurement!

Do you, or anyone, know if anyone has made similar type of measurement on memory latency, including caching effects, from a single core to all memory in the system?

21
General CPU Discussion / Re: Asymmetric CPUs with the Same Core Count
« on: September 29, 2022, 05:18:26 pm »
Does that indicate the caching benefit?   E.g. is the darkest purple runs from the local L2, hence is the lowest latency?  Then the light-but-still-solid purple runs from the shared 10MB-L3 so has the next lowest latency.  The red/purple mix is going out to DIMM memory attached to the same die so it has yet greater latency, and finally the orange is DIMM memory on the other die, hence hops across the socket-to-socket link and has the largest latency.

I don't get why some dark-purple boxes are bigger than others, though.  Also why do some have the solid-light purple around them but not all?  It doesn't all line up.

22
General CPU Discussion / Re: Asymmetric CPUs with the Same Core Count
« on: September 20, 2022, 08:30:16 pm »
How do you read the purple and orange plot??

23
General CPU Discussion / Re: Asymmetric CPUs with the Same Core Count
« on: September 20, 2022, 08:15:01 pm »
Bonus cache!    Presumably 1 core in each of the pair that share the L2/L3 was bad in some way, and that die did not have enough paired-cache cores to make a matched set of 18, so used some of the remaining unpaired cores to make up the difference.  Given the difficulty of yielding big die on a modern technology node, reducing core count is common industry practice.  It's a little unusual here that 2 core share the cache so when one gets knocked out, the other retains access to the full amount of cache.  If the cache had been reduced any, then that core might perform worse versus the same workload on only 1 of the 2 cores in a paired-cache config, so it gets to keep the full local cache without having to share any...yay!

Raptor identifies this config specifically as "unpaired" in their 4-core and 8-core cpu description, e.g.:  https://raptorcs.com/content/CP9M31/intro.html
   4 cores per package
       3.2GHz base / 3.8GHz turbo (WoF)
       90W TDP
       All Core Turbo capable
       32KB L1 data cache + 32KB L1 instruction cache / core
       512KB unpaired L2 cache / core
       10MB unpaired L3 cache / core

Versus the larger core-count, e.g.:  https://raptorcs.com/content/CP9M36/intro.html
   18 cores per package
       2.8GHz base / 3.8GHz turbo (WoF)
       190W TDP
       32KB L1 data cache + 32KB L1 instruction cache / core
       512KB L2 cache / core pair
       10MB L3 cache / core pair


If you don't want the bonus cache, I am sure someone would trade CPUs with you.  ;-)


24
General OpenPOWER Discussion / Re: Building a E1080 Power10 Mini Server
« on: September 20, 2022, 07:59:37 pm »
It looks to me like a 3D-printed model of the processor drawer. 

Man, that person has talent!!

25
General CPU Discussion / Re: The point about Power 10 currently
« on: September 01, 2022, 02:54:32 am »
Seems unlikely these processors will make it to consumer boxes given the firmware blobs for DDIMMs & PCIe, but one can always dream...
https://www.nextplatform.com/2022/07/12/can-ibm-get-back-into-hpc-with-power10/



Agreed that P9 is still more than plenty capable, likely lots of capability that hasn't yet been tapped.

26
Talos II / Re: mixing memory sizes on the same CPU?
« on: September 01, 2022, 02:38:42 am »
Useful picture here:  https://www.itjungle.com/2018/03/05/deal-power9-memory-entry-servers/



The P9-Nimbus processor has 8 channels.  Sforza pins out 2 per right/left side; the other 2 on each side are unusable.

First factor:  does the system mainboard connect 2 DIMMs to the same memory channel or 1 DIMM per memory channel for each of the 4?   If 1 per channel, that is best performance and most flexibility, opening the possibility that mismatched DIMM sizes would each be used at full capacity.   If 2 DIMMs are connected on one channel, they must be fully matched (size, speed, maybe even manufacturer).

As seen here in this orientation (rotated 90-degrees from picture above):


The DIMM interfaces on the processor die are split half-top and half-bottom; the individual memory controllers are grouped on their respective edge and must run the same (maybe slowest plugged DIMM?) clock rate.  I -think- top group vs. bottom group are independent enough to run different sizes and even speeds.

It is also very dependent on the firmware + software.  Unclear to me if that will properly set up & use what the hardware sees.

27
Talos II / Re: different memory sizes on each CPU in multi-CPU systems?
« on: September 01, 2022, 02:13:45 am »
I believe the processor hardware can support different memory configurations on different sockets in a system.  I suspect it depends on support in the firmware and/or software, not the hardware.

28
General OpenPOWER Discussion / Re: News?
« on: July 28, 2022, 07:25:35 pm »
>>This all solely due to IBM's poor decision to close parts of the POWER10 platform, it's quite sad.

Re: those blobs, based on GITHUB links from Twitter post (listed below), seem to be related to non-IBM components;  very possible IBM is not allowed to make the source code available.   

PCIe I/O provided by Synopsis:  "ricmata and op-jenkins Synopsys firmware 2.04p fixes DFE slicer calibration"
   https://github.com/open-power/hcode/blob/master-p10/import/chips/p10/procedures/ppe/iop/
   https://www.synopsys.com/designware-ip/interface-ip/pci-express/pci-express-5.html
   https://www.synopsys.com/dw/ipdir.php?ds=dwc_pcie5_phy

DDIMM memory buffer "Explorer" firmware provided by Microsemi:
   https://github.com/open-power/ocmb-explorer-fw/
   https://www.microsemi.com/document-portal/doc_download/1244316-smc-1000-8x25g-smart-memory-controller
   https://www.anandtech.com/show/14706/microchip-announces-dram-controller-for-opencapi-memory-interface

Disappointing for sure, and may not be able to change going forward.

29
General CPU Discussion / Re: The point about Power 10 currently
« on: December 19, 2021, 10:07:17 pm »
Well, I do not believe that anything about that POWER10 module precludes being used in a home environment, just that IBM would have to sell it on the OpenPOWER market and Raptor or someone would have to build a motherboard to support it.  I think this is the HotChips presentation?
   https://regmedia.co.uk/2020/08/17/ibm_power10_summary.pdf

Those external-cable connectors would be for the PowerAXON interfaces which let the processor scale up to lots of sockets.   The memory and PCI interfaces wouldn't use such cables.   Much like all the POWER9 processors have lots of socket-to-socket I/O that aren't used by the Raptor systems, a bare-bones POWER10 system could ignore all the cabled interfaces and just use the ones that connect thru the socket. 

Of course, the likely answer is that the cost of such a design is still prohibitive, since it presumably has to use the fancy buffered memory DIMMs:
   https://fuse.wikichip.org/news/2893/ibm-adds-power9-aio-pushes-for-an-open-memory-agnostic-interface/

It isn't clear to me how the spring mid-range and low-end announcements will be more amenable to a consumer-focused POWER10 system, but here's to hoping that the landscape changes come 2022!   I would love to buy a POWER10 system!!





30
General Hardware Discussion / Re: 2u Blackbird Build with 18 cores?!
« on: December 19, 2021, 09:43:08 pm »
How do you mean "invisible?"   Are you referencing the "Current" field of the bottom chart?

That looks to me as a plotting problem, where the processor is going to a range of 125-130A VDD which for some reason goes off the top of the chart.  If the chart was scaled to, say, 150A then we could see where the current really rests.

I am not sure which of the blue or green Voltage curves is representative of processor VDD.   Maybe the blue is the regulator setpoint and the green is VDD at the processor given loadline loss?    If so, then given current is 125A at processor-VDD=0.85v, the processor is consuming at least 125A * 0.85v = 106W of VDD power.   VDN is 15.6A * 0.69v = 10.8W.   I cannot tell how much VIO power is being used, let's just guess 30W?   That would put the processor in the ballpark of 106W + 10.8W + 30W ~=147W.

When you compare that to the idle data, which shows VDD current of ~30A at about 1.0v, that would be 30W VDD power versus the 106A fully-loaded.

One line that seems worrisome is the "VR" temp which appears to go to 115C.  That seems pretty extreme.  I don't know what that regulator is rated for but other server designs I know about have long-term reliability limits around 90C.  This may be an example of how the Blackbird design is not built to handle a high-core-count processor, as those tend to run much lower voltages and higher currents at the same power as a low-core-count processor.    It would be nice to have comparison data of an 8-core processor running at max power for comparison, but I haven't seen anyone else post any characterization results like you have.





Pages: 1 [2] 3