Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - ejfluhr

Pages: [1] 2 3
1
General OpenPOWER Discussion / Re: POWER11 on the horizon?
« on: January 23, 2025, 07:24:24 pm »
Video interview with Bill Starke, current chief architect for Power CPUs, here:  https://www.youtube.com/watch?v=v9S_w_Bj3oo

Interesting points:
   - Power11 is evolutionary from Power10 architecture
   - Core is same POWER ISA version as Power10, will run faster -> presume that means higher frequencies
   - more cores active than Power10
   - future CPU after Power11, neither confirmed nor denied as Power12, will be chiplet architecture similar to AMD because that was deemed better for scaling up to 16-socket computer
   - continued use of OMI memory now and into the future due to performance and RAS benefits

It doesn't look like Power11 will be any different than Power10 as far as open computing goes.

Here's hoping that S1 succeeds...


2
Firmware / Re: Adventures in reverse engineering broadcom nic firmware
« on: January 03, 2024, 04:32:58 pm »
Wow, quite an entertaining talk, and nice contribution to increasing openness of the platform...well done Hugo.

It would be quite interesting to hear from any Broadcom engineers on the hilarity of The Great Broadcom BitBang.  I would not be suprised if that was a hack invented to solve some early problem with thye design that nobody bothered to go back and fix.

3
Talos II / Re: Talos II reboots itself
« on: December 13, 2023, 06:11:40 pm »
Is the error always on c4?   If yes, can you disable c4?   

4
Mod Zone / Re: Custom cooler mount
« on: October 20, 2023, 09:49:36 am »
That looks interesting but
>We therefore use highly optimized assembly routines that take the specific properties of a given processor microarchitecture into account.

Is that what is being done for
>POWER9 support is planned
??


I'm running a mersenne-prime calculator which seems to push the CPU pretty hard.  At least, it runs down near the "base" frequency of 3.2GHz (I have a 4-core CPU).

>cat mersenne.c
#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>
int main(int argc, char *argv[]) {
   char *endptr;
   unsigned long int p = strtoul(argv[1],&endptr,10);
   mpz_t M, powerof2, one, two;
   mpz_init(M); mpz_init(powerof2);
   mpz_init_set_str(one,"1",10);
   mpz_init_set_str(two,"2",10);
   mpz_pow_ui(powerof2,two,p);
   mpz_sub(M,powerof2,one);
   gmp_printf("%Zd",M);
   return 0;
}



Run with:

>cat mersenne16.ksh
num=82589933
thread=0
while (( thread < 16 ))
do
   echo $thread
   echo time ./mersenne $num > M48.$thread
   time ./mersenne $num > M48.$thread &
   (( thread += 1 ))
done



5
General OpenPOWER Discussion / Re: POWER11 on the horizon?
« on: October 20, 2023, 09:28:01 am »
>POWER8 came in 2014, POWER9 was introduced in 2017 and POWER10 ended up being introduced in 2021.

Past processor:  2021-2017 = 4 years
Next processor?  2021+4 years = 2025

6
Mod Zone / Re: Custom cooler mount
« on: October 09, 2023, 06:43:39 pm »
>with the big benefit being keeping the temps down keeps the clocks high and the power consumption down.

If using IBM's WOF, the algorithm does not work that way.   It should boost to "the same" frequency regardless of CPU temp until it exceeds the temp limit at which point it will lower frequency to protect temperatures.  The TDP is conservative and very few workloads would exceed that.   The bigger factor affecting the temp protection mechanism is ambient temp....> 30C is more likely to throttle than < 30C.

What workload are you testing with?   I just got a Blackbird running Ubuntu and am stress testing it to see how it responds.  It's quite fun.

7
IBM rated all the POWER8 - POWER9 processors to long-term reliability temp of 85C.   >85C may be a tad hot for long life...

8
Mod Zone / Re: Custom cooler mount
« on: October 04, 2023, 06:52:40 pm »
Seems like you have an 18c POWER90 rated at 190W?   https://raptorcs.com/content/CP9M36/intro.html

POWER9 TDP long-term max temperature rating is 85C.  If your system runs ~70C, you have decent margin to the reliability limit.  How much is the improved cooling and how much is your workload would probably need to come from power data vs. that 190W spec.




9
Mod Zone / Re: Custom cooler mount
« on: September 11, 2023, 09:12:02 pm »
>For stability testing I ran this machine hard for multi-day cycles (one of my own tools that 100% loads the CPU cores with lots of maths).

Do you know how "hot" that workload is compared to the TDP rating of the processor? 

Regards, Eric

10
Mod Zone / Re: Custom cooler mount
« on: August 17, 2023, 11:02:38 am »
Do you have any idea how much force you are applying down through the module to the socket pins?   IBM has a target pressure to ensure even and reliable contacting across all pins for the life of the processor given certain assumptions about electrical loads (e.g. amps thru pins) and thermal cycling.   It would be interesting to know if your solutions are approximately the same, or much higher or lower.

11
Firmware / Re: Updating Talos II firmware to IBM PNOR V2.18?
« on: April 17, 2023, 05:44:57 pm »
The 18c & 22c parts are "paired" meaning 2 SMT4 cores share the same L2 & L3, unlike the 4c and 8c which are "unpaired" meaning each core gets the full L2 and L3 to itself.    This is not the same as "fused" (i.e. SMT8 cores) but it is quite likely that the fix will also work for "paired" cores as presumably the issue is sharing cacheable/non-cacheable pathways.   Good luck!

12
Mod Zone / Re: Initial findings running on water cooling
« on: March 30, 2023, 07:59:11 pm »
That is awesome.    Do you know what the power difference is running the same benchmark air-cooled vs. water-cooled?

Are you thinking of boosting frequency higher on the CPU since it is running cooler & lower power?

13
Firmware / Re: Updating Talos II firmware to IBM PNOR V2.18?
« on: March 14, 2023, 06:05:09 pm »
I believe that all OpenPOWER/OPAL-based systems use SMT4 cores, not SMT8 (i.e. "fused") cores.    So that may mean you aren't seeing the same problem.

Do you get the same fault callout?

14
Blackbird / Re: Blackbird Cooling
« on: October 18, 2022, 01:02:27 am »
https://www.infineon.com/cms/en/product/power/dc-dc-converters/integrated-power-stages/tda21472/

It looks like that VRM is rated at 60A - 70A in typical ambient temps (e.g. < 40C).   It seems like there must be at least 2 in parallel on VDD, for a capacity of 120+A?   Efficiency falls off as the load gets that high, though.  It would be better to have 3 such stages so they draw only 30A - 40A under normal operation.

15
Firmware / Re: Messing with WOF Tables
« on: October 10, 2022, 05:58:15 pm »
>Basically you have the 8-core that Raptor sells but with the paired cores still on

What a neat observation!   Something to consider is that 160W at 2.5GHz is probably achieved at much lower VDD than the 8c 160W at 3.45GHz (https://raptorcs.com/content/CP9M32/intro.html).

P=IxV  =>  160W = I_3.45 x V_3.45 = I_2.5 x V_2.5

If voltage moves 1:1 with frequency (I don't know if it does, could be 1:2 or 2:1, but got to start somewhere), then 3.45GHz/2.5GHz = 1.38x.   So V_3.45 = V_2.5 x 1.38, alternatively V_2.5 = V_3.45 / 1.38.

So if P is constant, and V_2.5 is that much below V_3.45, then I_2.5 has to be increased by 1.38.

This is probably why the Blackbird wiki states:  Other CPUs (CPUs with a TDP greater than 160W) may operate without WoF due to power regulator limitations.

Even without WOF (i.e. at the base of 2.5GHz), you are probably pushing those regulators much harder than the 8c module does.

I couldn't find any information on how much VDD current the Blackbird regulators can support.  In the post by user deepblue, graphs indicate the processor was exceeding VDD load of 130A at 0.89v under some load.    Doing stupid translation to 16c just to see where that lands is (130A / 18c) x 16c ~= 115, presumably lower as the 18c is 190W and the 16c is 160W.   If your temps are running high, then perhaps the regs are only built to support ~100A and you are at or over that spec...


Pages: [1] 2 3