Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - ejfluhr

Pages: [1] 2
General CPU Discussion / Re: The point about Power 10 currently
« on: December 19, 2021, 10:07:17 pm »
Well, I do not believe that anything about that POWER10 module precludes being used in a home environment, just that IBM would have to sell it on the OpenPOWER market and Raptor or someone would have to build a motherboard to support it.  I think this is the HotChips presentation?

Those external-cable connectors would be for the PowerAXON interfaces which let the processor scale up to lots of sockets.   The memory and PCI interfaces wouldn't use such cables.   Much like all the POWER9 processors have lots of socket-to-socket I/O that aren't used by the Raptor systems, a bare-bones POWER10 system could ignore all the cabled interfaces and just use the ones that connect thru the socket. 

Of course, the likely answer is that the cost of such a design is still prohibitive, since it presumably has to use the fancy buffered memory DIMMs:

It isn't clear to me how the spring mid-range and low-end announcements will be more amenable to a consumer-focused POWER10 system, but here's to hoping that the landscape changes come 2022!   I would love to buy a POWER10 system!!

General Hardware Discussion / Re: 2u Blackbird Build with 18 cores?!
« on: December 19, 2021, 09:43:08 pm »
How do you mean "invisible?"   Are you referencing the "Current" field of the bottom chart?

That looks to me as a plotting problem, where the processor is going to a range of 125-130A VDD which for some reason goes off the top of the chart.  If the chart was scaled to, say, 150A then we could see where the current really rests.

I am not sure which of the blue or green Voltage curves is representative of processor VDD.   Maybe the blue is the regulator setpoint and the green is VDD at the processor given loadline loss?    If so, then given current is 125A at processor-VDD=0.85v, the processor is consuming at least 125A * 0.85v = 106W of VDD power.   VDN is 15.6A * 0.69v = 10.8W.   I cannot tell how much VIO power is being used, let's just guess 30W?   That would put the processor in the ballpark of 106W + 10.8W + 30W ~=147W.

When you compare that to the idle data, which shows VDD current of ~30A at about 1.0v, that would be 30W VDD power versus the 106A fully-loaded.

One line that seems worrisome is the "VR" temp which appears to go to 115C.  That seems pretty extreme.  I don't know what that regulator is rated for but other server designs I know about have long-term reliability limits around 90C.  This may be an example of how the Blackbird design is not built to handle a high-core-count processor, as those tend to run much lower voltages and higher currents at the same power as a low-core-count processor.    It would be nice to have comparison data of an 8-core processor running at max power for comparison, but I haven't seen anyone else post any characterization results like you have.

General Hardware Discussion / Re: 2u Blackbird Build with 18 cores?!
« on: December 11, 2021, 12:25:55 am »
Re: the xz compression behavior, can you tell if you are memory bandwidth starved?  Lots of cores vs. Blackbird's meager DIMM capacity isn't a very balanced combination. 

BTW, I looked at your awesome graphs up above more carefully and you can see how current & voltage are moving around --> that looks like the processor is indeed dynamically boosting/dropping frequency to manage within it's power limit.

General CPU Discussion / Re: The point about Power 10 currently
« on: December 10, 2021, 09:57:41 am »

Pictures/info here show fancy "on the substrate" connectors for external cables, plus requirement for memory-buffer-based DIMMs.
It does seem that the expense of building around these features is pretty high, not something for an end-user-affordable system.

Talos II / Re: Trying to overclock, what is raptor-aggressive
« on: November 29, 2021, 08:31:18 am »
Pp. 35 of indicates this (below).   Maybe "folding" means putting cores to sleep???

  Page 35
Processor Folding in Linux
It is essential to install a daemon package based on the host OS to enable utilization-based processor
folding for Static Power Saver and Idle Power Saver modes:
Version 5.4 has the necessary user space tools required to enable CPU Folding.4
Once this package is installed, the energyd daemon will monitor the system power mode and activate
processor folding when system power mode is set to "Static Power Saver" and deactivate processor
folding in all other modes.  The utilization-based CPU folding daemon will deactivate unused cores and
transition them to low power idle states until the CPU utilization increases and those cores are activated
to run a workload.
Utilization-based processor folding can be manually disabled using the following commands:
 /etc/init.d/energyd stop #Stop daemon now, activate all cores
 chkconfig energyd off #Do not restart daemon on startup
 rpm -e pseries-energy #un-install the package completely
Alternatively, CPU cores can be folded or set to low power idle state in any power mode
manually using the following command line:
 echo 0 > /sys/devices/system/cpu/cpuN/online #Where N is the
logical CPU number
Please note that all active hardware threads of a core needs to be taken off-line using the above
command in order to move the core to a low power idle state.
The cores can be activated again with the following command:
 echo 1 > /sys/devices/system/cpu/cpuN/online #Where N is the
logical CPU number

Talos II / Re: Trying to overclock, what is raptor-aggressive
« on: November 29, 2021, 08:08:47 am »
Is the lack of "sleep" mode something to do with Linux support?    The processor has lots of core power-down mode as indicated by this doc, pp. 7:

Seems like it should be possible to use Linux tools to manage frequency.  pp. 15 says: 
>>#Use cpupower tool to query and set frequency
>>Available frequency steps from cpupower will list only the nominal range, but user can select full fre-
>>quency range to set and it will take effect.

>Does the CPU have any of these settings in it?
I believe the CPU contains processor voltage/freq limitation as defined in the VPD poundv, hence why Raptor's code overrides that.
Once that is done, possibly the WOF boost table has to also be over-written to match the VPD values --> I saw that in a comment in one of the scripts.

On top of that, Linux should be able to provide "direction" as to what cores are active and what frequency range to target.  I've played with that a bit on x86 a few years ago so hopefully that also works on POWER9?  I don't have a POWER9 system to play with but here is what I get on my x86 laptop which shows min - max of 800MHz - 3600MHz:
>ls /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo*

>cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo*

Talos II / Re: Trying to overclock, what is raptor-aggressive
« on: November 22, 2021, 11:40:35 am »
Well, then sorry, I cannot help.  I would guess this is a pretty technical rabbit hole that likely few people have gone thru; maybe even only Raptor.   I was trying to contribute to the cause, exploring how to make it work by crawling thru all the code from Raptor & IBM in the repos.   It seems that Raptor figured it out, so the path to success is in there, somewhere.

Talos II / Re: Trying to overclock, what is raptor-aggressive
« on: November 20, 2021, 04:52:22 am »
So it looks like the WOF boost tables are posted here?

Lots of interesting info in this.  Seems like the 4-core target frequency is 3200 and the ultraturbo frequency is 3800.  Last column looks to be the "WOF frequency" according to the header.
Here is the header followed by a single line (of thousands) which seems to represent something about how it decides the target frequency:  (I'm not sure how to fix the column formatting, so readers will have to deconvolve that...)
OpenPOWER Raptor   95   Sforza   v7.3.3   90   108   4   3200   3800   1867   0.0409   0.0417   1   0.1   0   0   0.25   0   6   0.0409   0   0.6   4   3800

Still cannot find how they get consumed by the boot process, though.

Talos II / Re: Trying to overclock, what is raptor-aggressive
« on: November 20, 2021, 04:26:22 am »
Ok, now I see that raptor-aggressive mods the CSV version of the WOF boost table to crank up the max frequency to 4.2G or 4.4G.
>cd ../wofdata
>php ../raptor-util/woferclock.php 5 3536 WOF_V7_3_3_SFORZA_4_90_3200_TM.csv.original WOF_V7_3_3_SFORZA_4_90_3200_TM.csv 1.187 4200
>php ../raptor-util/woferclock.php 5 4052 WOF_V7_3_3_SFORZA_8_160_3500_TM.csv.original WOF_V7_3_3_SFORZA_8_160_3500_TM.csv 1.05 4400
>php ../raptor-util/woferclock.php 5 3094 WOF_V7_3_3_SFORZA_18_190_2800_TM.csv.original WOF_V7_3_3_SFORZA_18_190_2800_TM.csv 1.10 4200
>php ../raptor-util/woferclock.php 5 3039 WOF_V7_3_3_SFORZA_22_190_2750_TM.csv.original WOF_V7_3_3_SFORZA_22_190_2750_TM.csv 1.10 4200

I can't find those boost tables in the Raptor git tree; maybe it is somewhere else?   Perhaps on your system already and you can just replace the old with the hacked version?

Maybe some PNOR updater here?
>if(-e $wof_binary_filename)
>    {
>        $sections{WOFDATA}{in} = "$wof_binary_filename";
>    }

Talos II / Re: Trying to overclock, what is raptor-aggressive
« on: November 20, 2021, 04:04:13 am »
Aha, looking farther it seems that they are programming poundv for a voltage uplift of 1.142x over the existing ultraturbo value.
# Reasonable defaults
# Partly validated on initial silicon
# NOT GUARANTEED, starting point ONLY!
if [[ "$CORE_COUNT" == "4" ]]; then
if [[ "$CORE_COUNT" == "8" ]]; then

If you can read out your original poundv values, then you know how much "headroom" your parts have to the maximum allowed:
># Rated limit plus safety margin
>#max_voltage = 1098
># Absolute maximum process limit
># Hardware damage WILL occur above this value!
>max_voltage = 1150

BTW, do you want or need WOF to manage the frequency/thermals, or just playing around and can manage it yourself?   With just the VPD update, perhaps you can disable WOF and manually override frequency via OS controls up to the newly-programmed ultraturbo maximum.   I don't know specifically how to do that, just believe it possible based on what little I know of Linux freq control.

Talos II / Re: Trying to overclock, what is raptor-aggressive
« on: November 20, 2021, 03:32:43 am »
Are you updating the VPD voltage/frequency table, or the WOF boost table, or both?    I'm not sure how it works on Linux, but the general architecture is that the VPD sets the maximum tested voltage/frequency and the WOF boost table defines how to set frequency based on (1) active workload power vs. TDP and (2) # of active cores.   The official boost table is designed to raise freq not beyond the module's AC power spec, and also prevent exceeding the RDP spec for the system VDD VRM (well, a transient spec on top of the RDP "DC" spec).

General Hardware Discussion / Re: 2u Blackbird Build with 18 cores?!
« on: November 12, 2021, 11:42:44 pm »
Nice update...tx!  Do you know if the xz compression uses the in-processor compression accelerator, or does it just run on the cores?

Do you know if WOF (dynamic frequency boosting) is active?   Since you are so power limited, perhaps it is lowering the core frequency when you add more threads than 1 per core?

General CPU Discussion / Re: IBM POWER9+ ?
« on: September 24, 2020, 02:06:37 am »
Even "simple" die shrinks are very, very costly, and POWER sales volumes cannot justify such an expense.  IBM succeeds due to its system sales, and OpenPOWER support is an offshoot of whatever the system roadmap drives for processor development.

Blackbird / Re: using bigger CPUs with some cores disabled on Blackbird?
« on: September 09, 2020, 05:50:08 pm »
Can you tell what Stop state the cores are put into before and after your command?    Technically the cores can be power-gated hence you can make larger-core-count modules behave similarly to the small-core-count variants.   The frequencies won't be identical, as the boost tables may adjust frequencies commensurate with extra regulator headroom but it won't line up identically to lower-core-count-tuned modules.  Higher frequency could affect your power readings, but taking cores offline should reduce power far faster than raising voltage/frequency, especially with 16->2 cores.

I'm not sure how well Linux supports all of the power management features of POWER9, though.   Are all cores active when at idle, or has it already power-gated many such that formally disabling them doesn't actually change their state?

May be worth posting your power/current readings if you have them.  VDD current is what matters, if you can separate that out.

Regards, Eric

Pages: [1] 2