Author Topic: Trying to overclock, what is raptor-aggressive  (Read 7448 times)

vmlinuz

  • Newbie
  • *
  • Posts: 25
  • Karma: +0/-0
    • View Profile
Trying to overclock, what is raptor-aggressive
« on: November 19, 2021, 04:17:16 pm »
I am trying to get my dual quad-core CPU's up to 4.2 GHz. Thus far I have run the woferclock script to update the module VPD, but how do I load the tables into the PNOR? There is very little or no documentation on how to get the appropriate files into the firmware or how to generate them. The wiki says something about raptor-aggressive but I can't find it in the firmware source. In addition, how can I restore the old module VPD from backup just in case?
« Last Edit: November 19, 2021, 05:01:00 pm by vmlinuz »

ejfluhr

  • Newbie
  • *
  • Posts: 44
  • Karma: +3/-0
    • View Profile
Re: Trying to overclock, what is raptor-aggressive
« Reply #1 on: November 20, 2021, 03:32:43 am »
Are you updating the VPD voltage/frequency table, or the WOF boost table, or both?    I'm not sure how it works on Linux, but the general architecture is that the VPD sets the maximum tested voltage/frequency and the WOF boost table defines how to set frequency based on (1) active workload power vs. TDP and (2) # of active cores.   The official boost table is designed to raise freq not beyond the module's AC power spec, and also prevent exceeding the RDP spec for the system VDD VRM (well, a transient spec on top of the RDP "DC" spec).

ejfluhr

  • Newbie
  • *
  • Posts: 44
  • Karma: +3/-0
    • View Profile
Re: Trying to overclock, what is raptor-aggressive
« Reply #2 on: November 20, 2021, 04:04:13 am »
Aha, looking farther it seems that they are programming poundv for a voltage uplift of 1.142x over the existing ultraturbo value.
https://git.raptorcs.com/git/vpdtools/plain/woferclock/woferclock_cpu
# Reasonable defaults
# Partly validated on initial silicon
# NOT GUARANTEED, starting point ONLY!
if [[ "$CORE_COUNT" == "4" ]]; then
   NEW_ULTRATURBO_MHZ=4200
   VOLTAGE_MULTIPLIER=1.142
fi
if [[ "$CORE_COUNT" == "8" ]]; then
   NEW_ULTRATURBO_MHZ=4400
   VOLTAGE_MULTIPLIER=1.142

If you can read out your original poundv values, then you know how much "headroom" your parts have to the maximum allowed:
https://git.raptorcs.com/git/vpdtools/plain/woferclock/update_poundv_buckets
># Rated limit plus safety margin
>#max_voltage = 1098
>
># Absolute maximum process limit
># Hardware damage WILL occur above this value!
>max_voltage = 1150

BTW, do you want or need WOF to manage the frequency/thermals, or just playing around and can manage it yourself?   With just the VPD update, perhaps you can disable WOF and manually override frequency via OS controls up to the newly-programmed ultraturbo maximum.   I don't know specifically how to do that, just believe it possible based on what little I know of Linux freq control.

« Last Edit: November 20, 2021, 05:19:24 am by ejfluhr »

ejfluhr

  • Newbie
  • *
  • Posts: 44
  • Karma: +3/-0
    • View Profile
Re: Trying to overclock, what is raptor-aggressive
« Reply #3 on: November 20, 2021, 04:26:22 am »
Ok, now I see that raptor-aggressive mods the CSV version of the WOF boost table to crank up the max frequency to 4.2G or 4.4G.
>#!/bin/bash
>
>cd ../wofdata
>php ../raptor-util/woferclock.php 5 3536 WOF_V7_3_3_SFORZA_4_90_3200_TM.csv.original WOF_V7_3_3_SFORZA_4_90_3200_TM.csv 1.187 4200
>php ../raptor-util/woferclock.php 5 4052 WOF_V7_3_3_SFORZA_8_160_3500_TM.csv.original WOF_V7_3_3_SFORZA_8_160_3500_TM.csv 1.05 4400
>php ../raptor-util/woferclock.php 5 3094 WOF_V7_3_3_SFORZA_18_190_2800_TM.csv.original WOF_V7_3_3_SFORZA_18_190_2800_TM.csv 1.10 4200
>php ../raptor-util/woferclock.php 5 3039 WOF_V7_3_3_SFORZA_22_190_2750_TM.csv.original WOF_V7_3_3_SFORZA_22_190_2750_TM.csv 1.10 4200

I can't find those boost tables in the Raptor git tree; maybe it is somewhere else?   Perhaps on your system already and you can just replace the old with the hacked version?

Maybe some PNOR updater here?  https://git.raptorcs.com/git/pnor/tree/update_image.pl
>if(-e $wof_binary_filename)
>    {
>        $sections{WOFDATA}{in} = "$wof_binary_filename";
>    }




ejfluhr

  • Newbie
  • *
  • Posts: 44
  • Karma: +3/-0
    • View Profile
Re: Trying to overclock, what is raptor-aggressive
« Reply #4 on: November 20, 2021, 04:52:22 am »
So it looks like the WOF boost tables are posted here?
https://github.com/open-power/WOF-Tables

Lots of interesting info in this.  Seems like the 4-core target frequency is 3200 and the ultraturbo frequency is 3800.  Last column looks to be the "WOF frequency" according to the header.
Here is the header followed by a single line (of thousands) which seems to represent something about how it decides the target frequency:  (I'm not sure how to fix the column formatting, so readers will have to deconvolve that...)
MOPT   YIELD   PACKAGE   VERSION   SOCKET_POWER   RDP_CAPACITY   CORE_COUNT   PDV_SORT_POWER_TARGET_FREQ   PDV_SORT_POWER_ULTRA_TURBO_FREQ   NEST_FREQ   VRATIO_START   VRATIO_STEP   FRATIO_START   FRATIO_STEP   CORE_CEFF   CORE_CEFF_INDEX   NEST_CEFF   NEST_CEFF_INDEX   ACTIVE_QUADS   VRATIO   VRATIO_INDEX   FRATIO   FRATIO_INDEX   WOF_FREQ
OpenPOWER Raptor   95   Sforza   v7.3.3   90   108   4   3200   3800   1867   0.0409   0.0417   1   0.1   0   0   0.25   0   6   0.0409   0   0.6   4   3800



Still cannot find how they get consumed by the boot process, though.
« Last Edit: November 22, 2021, 11:44:37 am by ejfluhr »

vmlinuz

  • Newbie
  • *
  • Posts: 25
  • Karma: +0/-0
    • View Profile
Re: Trying to overclock, what is raptor-aggressive
« Reply #5 on: November 20, 2021, 08:13:53 pm »
Thank you but this is all gibberish to me, I just want to know how to get it up to 4.2 GHz ultra turbo like the wiki describes, and then how to restore the original settings when I want better performance per watt.
« Last Edit: November 20, 2021, 08:18:45 pm by vmlinuz »

ejfluhr

  • Newbie
  • *
  • Posts: 44
  • Karma: +3/-0
    • View Profile
Re: Trying to overclock, what is raptor-aggressive
« Reply #6 on: November 22, 2021, 11:40:35 am »
Well, then sorry, I cannot help.  I would guess this is a pretty technical rabbit hole that likely few people have gone thru; maybe even only Raptor.   I was trying to contribute to the cause, exploring how to make it work by crawling thru all the code from Raptor & IBM in the repos.   It seems that Raptor figured it out, so the path to success is in there, somewhere.

vmlinuz

  • Newbie
  • *
  • Posts: 25
  • Karma: +0/-0
    • View Profile
Re: Trying to overclock, what is raptor-aggressive
« Reply #7 on: November 22, 2021, 08:31:52 pm »
I meant no offense, the information you provided was just way more technical than I expected and I was hoping for a simple answer - looks like the "simple answer" isn't supported anymore. 3800 MHz is plenty for now.

MPC7500

  • Hero Member
  • *****
  • Posts: 596
  • Karma: +41/-1
    • View Profile
    • Twitter
Re: Trying to overclock, what is raptor-aggressive
« Reply #8 on: November 23, 2021, 09:34:12 am »
IIRC, you have to change the tables in the OpenBMC source and compile.

It would be interesting to see how high a dual 22-core POWER9 can be clocked with nitrogen cooling 8)

surf

  • Newbie
  • *
  • Posts: 25
  • Karma: +1/-0
    • View Profile
Re: Trying to overclock, what is raptor-aggressive
« Reply #9 on: November 24, 2021, 08:23:05 am »
Does the CPU have any of these settings in it?  Or does the BMC detect the chip and then know how to set it up?

It would be nice to control this on the fly.  Since these computers don't have a "sleep" mode then maybe the next best thing would be to drop the speed/voltage or even turn off cores?


ejfluhr

  • Newbie
  • *
  • Posts: 44
  • Karma: +3/-0
    • View Profile
Re: Trying to overclock, what is raptor-aggressive
« Reply #10 on: November 29, 2021, 08:08:47 am »
Is the lack of "sleep" mode something to do with Linux support?    The processor has lots of core power-down mode as indicated by this doc, pp. 7:
   https://www.ibm.com/downloads/cas/6GZMODN3

Seems like it should be possible to use Linux tools to manage frequency.  pp. 15 says: 
>>#Use cpupower tool to query and set frequency
>>Available frequency steps from cpupower will list only the nominal range, but user can select full fre-
>>quency range to set and it will take effect.

>Does the CPU have any of these settings in it?
I believe the CPU contains processor voltage/freq limitation as defined in the VPD poundv, hence why Raptor's code overrides that.
Once that is done, possibly the WOF boost table has to also be over-written to match the VPD values --> I saw that in a comment in one of the scripts.

On top of that, Linux should be able to provide "direction" as to what cores are active and what frequency range to target.  I've played with that a bit on x86 a few years ago so hopefully that also works on POWER9?  I don't have a POWER9 system to play with but here is what I get on my x86 laptop which shows min - max of 800MHz - 3600MHz:
>ls /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo*
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency

>cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo*
3600000
800000
0

ejfluhr

  • Newbie
  • *
  • Posts: 44
  • Karma: +3/-0
    • View Profile
Re: Trying to overclock, what is raptor-aggressive
« Reply #11 on: November 29, 2021, 08:31:18 am »
Pp. 35 of https://www.ibm.com/downloads/cas/6GZMODN3 indicates this (below).   Maybe "folding" means putting cores to sleep???

  Page 35
Processor Folding in Linux
It is essential to install a daemon package based on the host OS to enable utilization-based processor
folding for Static Power Saver and Idle Power Saver modes:
pseries-energy-1.4.0-1.el7.ppc64.rpm
pseries-energy-1.4.0-1.el6.ppc64.rpm
pseries-energy-1.4.0-1.sles11.ppc64.rpm
Version 5.4 has the necessary user space tools required to enable CPU Folding.4
 
Once this package is installed, the energyd daemon will monitor the system power mode and activate
processor folding when system power mode is set to "Static Power Saver" and deactivate processor
folding in all other modes.  The utilization-based CPU folding daemon will deactivate unused cores and
transition them to low power idle states until the CPU utilization increases and those cores are activated
to run a workload.
 
Utilization-based processor folding can be manually disabled using the following commands:
 
 /etc/init.d/energyd stop #Stop daemon now, activate all cores
 chkconfig energyd off #Do not restart daemon on startup
 
 -or-
 
 rpm -e pseries-energy #un-install the package completely
 
Alternatively, CPU cores can be folded or set to low power idle state in any power mode
manually using the following command line:
 
 echo 0 > /sys/devices/system/cpu/cpuN/online #Where N is the
logical CPU number
 
Please note that all active hardware threads of a core needs to be taken off-line using the above
command in order to move the core to a low power idle state.
 
The cores can be activated again with the following command:
 
 echo 1 > /sys/devices/system/cpu/cpuN/online #Where N is the
logical CPU number

bobpaul

  • Newbie
  • *
  • Posts: 11
  • Karma: +3/-0
    • View Profile
Re: Trying to overclock, what is raptor-aggressive
« Reply #12 on: December 17, 2024, 11:43:22 am »
Oh wow. This thread is great. I've been looking into the overclock stuff with the intent of underclocking to attempt to lower idle power on my Talos. I was hoping that maybe I can reduce the base frequency without reducing the max turbo too much. But getting power savings without underclocking would obviously be ideal...

Pp. 35 of https://www.ibm.com/downloads/cas/6GZMODN3 indicates this (below).   Maybe "folding" means putting cores to sleep???

In that PDF on page 7 it describes Processor Sleep, Core Nap, and Folding. It says sleep will happen when "a user de-configures a core" or (on non-AIX systems) whenever there's long periods of idleness.

Core Nap is not as low power, but it's fast enough that the OS can just use it for general purpose idle.

It sounds like "Processor Folding" is allowing a user-space daemon to enable/disable cores during idle periods to prevent the OS from scheduling threads so that the firmware will see long enough idle periods and put cores to sleep. This should maximize the number of cores that are sleeping/napping when the overall system is rather idle.

I did previously play around with using `/sys/devices/system/cpu/cpu${N}/online` to enable/disable cores, even disabling all of the cores in my 2nd CPU and leaving only the first 4 slices (logical cores) on CPU0 enabled, but I did not notice any real difference at the AC power meter.

And looking around, I cannot find the pseries-energy-1.4.0-1.el7.ppc64.rpm file. Fedora and Debian don't seem to have any "pseries" or "energy" packages and I didn't see it on SLES, either (at least using the web-based search tool as I don't have SLES installed).

But if I'm not getting any power savings from manually disabling cores, the "folding daemon" probably won't be of much help, either, huh? Certainly I could see this helping to get higher single thread boosting on larger systems. I read somewhere else that individual CPU cores can be fully power gated, but all of the cores in a quad which are active have to be at the same frequency. I always assumed that the linux kernel scheduler probably isn't aware of this limitation, so a userspace daemon that helps manage this makes sense to me.

Also do we know how much of the behavior described in the PDF is specific to the PNOR that's flashed on the CPU? Do we know if the firmwares Raptor releases are allowing cores to actually enter the deepest sleep states? I assume IBM provides their own PNOR to their customers.

cy384

  • Newbie
  • *
  • Posts: 13
  • Karma: +4/-0
    • View Profile
    • http://cy384.com/
Re: Trying to overclock, what is raptor-aggressive
« Reply #13 on: December 17, 2024, 10:34:08 pm »
I'm not sure 100%, but based on what I'm reading, I think "folding" is just disabling cores temporarily, and re-enabling them when more work needs to be done.  Normally linux likes to spread workloads across all the cores and shuffle them around, so hypothetically you could get power savings by leaving most of the cores constantly in the minimum running power state.  In practice I think this has little to no effect on Raptor machines.  Linux calls the lowest power state "snooze" which I would guess is the same as what IBM is calling "nap", but it's hard to tell for sure.

How much power is yours using?  My blackbird never reports CPU power usage of less than 30 watts (self reported, not measured with an external meter).  Considering these are seven year old server chips, I think that's pretty good.

Underclocking may be worth trying, but I don't think IBM has any published WOF tables running the CPU at less than 1.9 GHz.  I'm sure there's some real minimum due to hardware constraints, but maybe it's a little lower.

bobpaul

  • Newbie
  • *
  • Posts: 11
  • Karma: +3/-0
    • View Profile
Re: Trying to overclock, what is raptor-aggressive
« Reply #14 on: December 19, 2024, 10:25:39 am »
I have the same understanding of "folding".

My Talos II has dual 8-core. The self reported power usage (via sensors) shows about 100w (~32 per chip, an extra 3w for each Vdd, 9w for each Vdn, and 12w for the single PPT). At the wall I get 140-150w, but I'm not sure how much is the radeon graphics and the HDD array. I should probably unplug everything to do a check with that.

Linux calls the lowest power state "snooze" which I would guess is the same as what IBM is calling "nap", but it's hard to tell for sure.

That seems a reasonable interpretation. From reading the power management section of the Processor Manual, I understand that the firmware on CMEs is responsible for actually changing the frequency and voltages. Page 314 describes the "stop states". Levels 4 and higher are "reserved for the hypervisor" (which I would read as the the "hypervisor" has to request these states). Level 8-11 are the quad-level states that can power down L2 and L3 caches. Levels 4-7 sound a lot like C1e and C3 on an x86-64 system. Levels 8-11 sound a lot like C3-C7 on x86 systems.

The IBM document (EnergyScale for POWER9) that ejfluhr found really makes it sound like disabling cores is enough, but from my testing, it really doesn't seem to make any difference. Either this is already at the minimum, or more work is needed either in the kernel or the firmware.

cpupower idle-states shows idle states of snooze, stop0_lite, stop0, stop1, stop2, stop4, and stop5. When my system is idle, the duration counters for stop4, stop5, stop0_lite, and snooze are all increasing. When I use cpupower --cpu 0,1,2,... idle-states (and provide only a list of offline cores) it just says all of the cores I listed are offline.

The hypervisor stuff still confuses me a bit.