Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - ejfluhr

Pages: 1 [2] 3 4
16
Firmware / Re: Updating Talos II firmware to IBM PNOR V2.18?
« on: March 14, 2023, 06:05:09 pm »
I believe that all OpenPOWER/OPAL-based systems use SMT4 cores, not SMT8 (i.e. "fused") cores.    So that may mean you aren't seeing the same problem.

Do you get the same fault callout?

17
Blackbird / Re: Blackbird Cooling
« on: October 18, 2022, 01:02:27 am »
https://www.infineon.com/cms/en/product/power/dc-dc-converters/integrated-power-stages/tda21472/

It looks like that VRM is rated at 60A - 70A in typical ambient temps (e.g. < 40C).   It seems like there must be at least 2 in parallel on VDD, for a capacity of 120+A?   Efficiency falls off as the load gets that high, though.  It would be better to have 3 such stages so they draw only 30A - 40A under normal operation.

18
Firmware / Re: Messing with WOF Tables
« on: October 10, 2022, 05:58:15 pm »
>Basically you have the 8-core that Raptor sells but with the paired cores still on

What a neat observation!   Something to consider is that 160W at 2.5GHz is probably achieved at much lower VDD than the 8c 160W at 3.45GHz (https://raptorcs.com/content/CP9M32/intro.html).

P=IxV  =>  160W = I_3.45 x V_3.45 = I_2.5 x V_2.5

If voltage moves 1:1 with frequency (I don't know if it does, could be 1:2 or 2:1, but got to start somewhere), then 3.45GHz/2.5GHz = 1.38x.   So V_3.45 = V_2.5 x 1.38, alternatively V_2.5 = V_3.45 / 1.38.

So if P is constant, and V_2.5 is that much below V_3.45, then I_2.5 has to be increased by 1.38.

This is probably why the Blackbird wiki states:  Other CPUs (CPUs with a TDP greater than 160W) may operate without WoF due to power regulator limitations.

Even without WOF (i.e. at the base of 2.5GHz), you are probably pushing those regulators much harder than the 8c module does.

I couldn't find any information on how much VDD current the Blackbird regulators can support.  In the post by user deepblue, graphs indicate the processor was exceeding VDD load of 130A at 0.89v under some load.    Doing stupid translation to 16c just to see where that lands is (130A / 18c) x 16c ~= 115, presumably lower as the 18c is 190W and the 16c is 160W.   If your temps are running high, then perhaps the regs are only built to support ~100A and you are at or over that spec...


19
Firmware / Re: Messing with WOF Tables
« on: October 07, 2022, 05:58:39 pm »
I copied that table CSV from GIT and filtered it to generate plots of frequency versus "CORE_CEFF" for a couple of different "VRATIO" values.
This link declares:  https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.905&rep=rep1&type=pdf

     Power ~= VDD2 x Fclk x Ceff
     where the effective switched capacitance, Ceff, is commonly expressed as the product of the physical capacitance CL, and the activity weighting factor α, each averaged over the N nodes.

I don't know how you can identify what "CORE_CEFF" is in your processor, but the equation shows how that correlates to power.  I.e. smaller Ceff equals lower power.
Then the plot looks meaningful since the lowest frequency is at the highest CORE_CEFF and the frequency climbs as CORE_CEFF gets smaller, up to some limit.
Since the largest value of CORE_CEFF in the table is 1.0, that would be the highest power condition presumably associated with the 160W power rating of the table/processor.

I could not figure out how to post an image of the graphs, nor will the forum let me post the XLSX file with base data plus graphing tab, since it is too big.   So I deleted a bunch of rows from the base data that had "NEST_CEFF" > 0.25.   This let me shrink the XLSX file enough to post it.   

The first tab is the CSV data as posted.  The second tab "Plotme" is a filter + graph that can be manipulated by the red-colored cells; one variable showing a big difference is the VRATIO which can be modified by adjusting the VRATIO_INDEX box in integer values from 0 to 23 (the table has entries for all of those).  The other 2 tabs are copies of the Plotme tab with just the values & graphs at VRATIO=1.0 and VRATIO=0.7498; this let me save and review them side-by-side.   You could probably get fancy and plot all the variations on a single graph but I didn't care to go that far.

I picked VRATIO=1.0 and VRATIO=0.7948 because the maximum frequency changes substantially between all those values, starting at 3.4GHz and climbing to 3.8GHz.  You can play with the VRATIO_INDEX in the Plotme tab and see how the frequency curve continues to increase at different CORE_CEFF values, though always capped to that 3.8GHz. 

Raptor quotes the 190W 18-Core CPU as:  2.8GHz - 3.8GHz, so presumably you now have a 160W 16-Core CPU  of 2.5GHz - 3.8GHz.
https://raptorcs.com/content/CP9M36/intro.html
   CP9M36
   IBM POWER9 v2 CPU (18-Core)
       18 cores per package
           2.8GHz base / 3.8GHz turbo (WoF)
           190W TDP

User @deepblue was running an 18-Core CPU on a Blackbird mainboard, though with extra cooling:
   https://forums.raptorcs.com/index.php/topic,99.0.html
Hopefully you will find out if, long term, the Blackbird can handle a 16-core P9 when it matches the TDP of the supported 8-core version.

Nice work!   Please report back in a few months....


20
Firmware / Re: Messing with WOF Tables
« on: October 07, 2022, 01:55:37 pm »
Too bad this isn't publicly available:
   https://research.ibm.com/publications/deterministic-frequency-boost-and-voltage-enhancements-on-the-power10tm-processor

Abstract
Digital droop sensors with core throttling mitigate microprocessor voltage droops and enable a voltage control loop (undervolting) to offset loadline uplift plus noise effects, protecting reliability VDDMAX.  These combine with a runtime algorithm for Workload Optimized Frequency (WOF) that deterministically maximizes core frequency.  The combined effect is demonstrated across a range of workloads including SPECTM , and provides up to a 15% frequency boost and a 10% reduction in core voltage.


21
Firmware / Re: Messing with WOF Tables
« on: October 07, 2022, 01:19:54 pm »
Can you read out the processor frequencies and tell if it is boosting properly?    Do you see ~160W when at idle?   What about when running max threads off some heavy workload (e.g. Mersenne primes is a good one)?

22
General CPU Discussion / Re: Asymmetric CPUs with the Same Core Count
« on: October 07, 2022, 01:05:39 pm »
500MHz affords 1/500Mhz = 2ns resolution of latency?   Sounds cool!   What is same-die typical memory latency, ~70ns??   Local L3 would have to run on the order of ~10ns??

There could be sensitivity to core frequency, presuming the L3 cache and some of the internal transfer network is pipelined.  E.g. running at 3.8GHz turbo will give a better answer than 2.8GHz base.

23
Talos II / Re: Temperatures and rotational speed of fans
« on: October 07, 2022, 12:59:25 pm »
The processor OCC firmware can call for fan speed increases if on-die Tj rise.

Where does that "Temperature Pcie" come from?   Presuming it is from PCIe card(s), it would seem sensible that the BMC should monitor and adjust accordingly.   Hard to imagine there is such a glaring bug in the BMC firmware, but worse has happened I guess.

Maybe Fan5 is supposed to cool the PCIe cards, but isn't working?


24
General CPU Discussion / Re: Asymmetric CPUs with the Same Core Count
« on: September 30, 2022, 06:11:02 pm »
> The graph is a measurement of how long it takes to send a message between cores,

Aha, that was a major piece of info I did not grasp.   Your other explanations all coincide nicely now with the picture.  Pretty interesting measurement!

Do you, or anyone, know if anyone has made similar type of measurement on memory latency, including caching effects, from a single core to all memory in the system?

25
General CPU Discussion / Re: Asymmetric CPUs with the Same Core Count
« on: September 29, 2022, 05:18:26 pm »
Does that indicate the caching benefit?   E.g. is the darkest purple runs from the local L2, hence is the lowest latency?  Then the light-but-still-solid purple runs from the shared 10MB-L3 so has the next lowest latency.  The red/purple mix is going out to DIMM memory attached to the same die so it has yet greater latency, and finally the orange is DIMM memory on the other die, hence hops across the socket-to-socket link and has the largest latency.

I don't get why some dark-purple boxes are bigger than others, though.  Also why do some have the solid-light purple around them but not all?  It doesn't all line up.

26
General CPU Discussion / Re: Asymmetric CPUs with the Same Core Count
« on: September 20, 2022, 08:30:16 pm »
How do you read the purple and orange plot??

27
General CPU Discussion / Re: Asymmetric CPUs with the Same Core Count
« on: September 20, 2022, 08:15:01 pm »
Bonus cache!    Presumably 1 core in each of the pair that share the L2/L3 was bad in some way, and that die did not have enough paired-cache cores to make a matched set of 18, so used some of the remaining unpaired cores to make up the difference.  Given the difficulty of yielding big die on a modern technology node, reducing core count is common industry practice.  It's a little unusual here that 2 core share the cache so when one gets knocked out, the other retains access to the full amount of cache.  If the cache had been reduced any, then that core might perform worse versus the same workload on only 1 of the 2 cores in a paired-cache config, so it gets to keep the full local cache without having to share any...yay!

Raptor identifies this config specifically as "unpaired" in their 4-core and 8-core cpu description, e.g.:  https://raptorcs.com/content/CP9M31/intro.html
   4 cores per package
       3.2GHz base / 3.8GHz turbo (WoF)
       90W TDP
       All Core Turbo capable
       32KB L1 data cache + 32KB L1 instruction cache / core
       512KB unpaired L2 cache / core
       10MB unpaired L3 cache / core

Versus the larger core-count, e.g.:  https://raptorcs.com/content/CP9M36/intro.html
   18 cores per package
       2.8GHz base / 3.8GHz turbo (WoF)
       190W TDP
       32KB L1 data cache + 32KB L1 instruction cache / core
       512KB L2 cache / core pair
       10MB L3 cache / core pair


If you don't want the bonus cache, I am sure someone would trade CPUs with you.  ;-)


28
General OpenPOWER Discussion / Re: Building a E1080 Power10 Mini Server
« on: September 20, 2022, 07:59:37 pm »
It looks to me like a 3D-printed model of the processor drawer. 

Man, that person has talent!!

29
General CPU Discussion / Re: The point about Power 10 currently
« on: September 01, 2022, 02:54:32 am »
Seems unlikely these processors will make it to consumer boxes given the firmware blobs for DDIMMs & PCIe, but one can always dream...
https://www.nextplatform.com/2022/07/12/can-ibm-get-back-into-hpc-with-power10/



Agreed that P9 is still more than plenty capable, likely lots of capability that hasn't yet been tapped.

30
Talos II / Re: mixing memory sizes on the same CPU?
« on: September 01, 2022, 02:38:42 am »
Useful picture here:  https://www.itjungle.com/2018/03/05/deal-power9-memory-entry-servers/



The P9-Nimbus processor has 8 channels.  Sforza pins out 2 per right/left side; the other 2 on each side are unusable.

First factor:  does the system mainboard connect 2 DIMMs to the same memory channel or 1 DIMM per memory channel for each of the 4?   If 1 per channel, that is best performance and most flexibility, opening the possibility that mismatched DIMM sizes would each be used at full capacity.   If 2 DIMMs are connected on one channel, they must be fully matched (size, speed, maybe even manufacturer).

As seen here in this orientation (rotated 90-degrees from picture above):


The DIMM interfaces on the processor die are split half-top and half-bottom; the individual memory controllers are grouped on their respective edge and must run the same (maybe slowest plugged DIMM?) clock rate.  I -think- top group vs. bottom group are independent enough to run different sizes and even speeds.

It is also very dependent on the firmware + software.  Unclear to me if that will properly set up & use what the hardware sees.

Pages: 1 [2] 3 4