Raptor Computing Systems Hardware > Talos II
Trying to overclock, what is raptor-aggressive
ejfluhr:
Is the lack of "sleep" mode something to do with Linux support? The processor has lots of core power-down mode as indicated by this doc, pp. 7:
https://www.ibm.com/downloads/cas/6GZMODN3
Seems like it should be possible to use Linux tools to manage frequency. pp. 15 says:
>>#Use cpupower tool to query and set frequency
>>Available frequency steps from cpupower will list only the nominal range, but user can select full fre-
>>quency range to set and it will take effect.
>Does the CPU have any of these settings in it?
I believe the CPU contains processor voltage/freq limitation as defined in the VPD poundv, hence why Raptor's code overrides that.
Once that is done, possibly the WOF boost table has to also be over-written to match the VPD values --> I saw that in a comment in one of the scripts.
On top of that, Linux should be able to provide "direction" as to what cores are active and what frequency range to target. I've played with that a bit on x86 a few years ago so hopefully that also works on POWER9? I don't have a POWER9 system to play with but here is what I get on my x86 laptop which shows min - max of 800MHz - 3600MHz:
>ls /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo*
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency
>cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo*
3600000
800000
0
ejfluhr:
Pp. 35 of https://www.ibm.com/downloads/cas/6GZMODN3 indicates this (below). Maybe "folding" means putting cores to sleep???
Page 35
Processor Folding in Linux
It is essential to install a daemon package based on the host OS to enable utilization-based processor
folding for Static Power Saver and Idle Power Saver modes:
pseries-energy-1.4.0-1.el7.ppc64.rpm
pseries-energy-1.4.0-1.el6.ppc64.rpm
pseries-energy-1.4.0-1.sles11.ppc64.rpm
Version 5.4 has the necessary user space tools required to enable CPU Folding.4
Once this package is installed, the energyd daemon will monitor the system power mode and activate
processor folding when system power mode is set to "Static Power Saver" and deactivate processor
folding in all other modes. The utilization-based CPU folding daemon will deactivate unused cores and
transition them to low power idle states until the CPU utilization increases and those cores are activated
to run a workload.
Utilization-based processor folding can be manually disabled using the following commands:
/etc/init.d/energyd stop #Stop daemon now, activate all cores
chkconfig energyd off #Do not restart daemon on startup
-or-
rpm -e pseries-energy #un-install the package completely
Alternatively, CPU cores can be folded or set to low power idle state in any power mode
manually using the following command line:
echo 0 > /sys/devices/system/cpu/cpuN/online #Where N is the
logical CPU number
Please note that all active hardware threads of a core needs to be taken off-line using the above
command in order to move the core to a low power idle state.
The cores can be activated again with the following command:
echo 1 > /sys/devices/system/cpu/cpuN/online #Where N is the
logical CPU number
bobpaul:
Oh wow. This thread is great. I've been looking into the overclock stuff with the intent of underclocking to attempt to lower idle power on my Talos. I was hoping that maybe I can reduce the base frequency without reducing the max turbo too much. But getting power savings without underclocking would obviously be ideal...
--- Quote from: ejfluhr on November 29, 2021, 08:31:18 am ---Pp. 35 of https://www.ibm.com/downloads/cas/6GZMODN3 indicates this (below). Maybe "folding" means putting cores to sleep???
--- End quote ---
In that PDF on page 7 it describes Processor Sleep, Core Nap, and Folding. It says sleep will happen when "a user de-configures a core" or (on non-AIX systems) whenever there's long periods of idleness.
Core Nap is not as low power, but it's fast enough that the OS can just use it for general purpose idle.
It sounds like "Processor Folding" is allowing a user-space daemon to enable/disable cores during idle periods to prevent the OS from scheduling threads so that the firmware will see long enough idle periods and put cores to sleep. This should maximize the number of cores that are sleeping/napping when the overall system is rather idle.
I did previously play around with using `/sys/devices/system/cpu/cpu${N}/online` to enable/disable cores, even disabling all of the cores in my 2nd CPU and leaving only the first 4 slices (logical cores) on CPU0 enabled, but I did not notice any real difference at the AC power meter.
And looking around, I cannot find the pseries-energy-1.4.0-1.el7.ppc64.rpm file. Fedora and Debian don't seem to have any "pseries" or "energy" packages and I didn't see it on SLES, either (at least using the web-based search tool as I don't have SLES installed).
But if I'm not getting any power savings from manually disabling cores, the "folding daemon" probably won't be of much help, either, huh? Certainly I could see this helping to get higher single thread boosting on larger systems. I read somewhere else that individual CPU cores can be fully power gated, but all of the cores in a quad which are active have to be at the same frequency. I always assumed that the linux kernel scheduler probably isn't aware of this limitation, so a userspace daemon that helps manage this makes sense to me.
Also do we know how much of the behavior described in the PDF is specific to the PNOR that's flashed on the CPU? Do we know if the firmwares Raptor releases are allowing cores to actually enter the deepest sleep states? I assume IBM provides their own PNOR to their customers.
cy384:
I'm not sure 100%, but based on what I'm reading, I think "folding" is just disabling cores temporarily, and re-enabling them when more work needs to be done. Normally linux likes to spread workloads across all the cores and shuffle them around, so hypothetically you could get power savings by leaving most of the cores constantly in the minimum running power state. In practice I think this has little to no effect on Raptor machines. Linux calls the lowest power state "snooze" which I would guess is the same as what IBM is calling "nap", but it's hard to tell for sure.
How much power is yours using? My blackbird never reports CPU power usage of less than 30 watts (self reported, not measured with an external meter). Considering these are seven year old server chips, I think that's pretty good.
Underclocking may be worth trying, but I don't think IBM has any published WOF tables running the CPU at less than 1.9 GHz. I'm sure there's some real minimum due to hardware constraints, but maybe it's a little lower.
bobpaul:
I have the same understanding of "folding".
My Talos II has dual 8-core. The self reported power usage (via sensors) shows about 100w (~32 per chip, an extra 3w for each Vdd, 9w for each Vdn, and 12w for the single PPT). At the wall I get 140-150w, but I'm not sure how much is the radeon graphics and the HDD array. I should probably unplug everything to do a check with that.
--- Quote from: cy384 on December 17, 2024, 10:34:08 pm ---Linux calls the lowest power state "snooze" which I would guess is the same as what IBM is calling "nap", but it's hard to tell for sure.
--- End quote ---
That seems a reasonable interpretation. From reading the power management section of the Processor Manual, I understand that the firmware on CMEs is responsible for actually changing the frequency and voltages. Page 314 describes the "stop states". Levels 4 and higher are "reserved for the hypervisor" (which I would read as the the "hypervisor" has to request these states). Level 8-11 are the quad-level states that can power down L2 and L3 caches. Levels 4-7 sound a lot like C1e and C3 on an x86-64 system. Levels 8-11 sound a lot like C3-C7 on x86 systems.
The IBM document (EnergyScale for POWER9) that ejfluhr found really makes it sound like disabling cores is enough, but from my testing, it really doesn't seem to make any difference. Either this is already at the minimum, or more work is needed either in the kernel or the firmware.
cpupower idle-states shows idle states of snooze, stop0_lite, stop0, stop1, stop2, stop4, and stop5. When my system is idle, the duration counters for stop4, stop5, stop0_lite, and snooze are all increasing. When I use cpupower --cpu 0,1,2,... idle-states (and provide only a list of offline cores) it just says all of the cores I listed are offline.
The hypervisor stuff still confuses me a bit.
Navigation
[0] Message Index
[*] Previous page
Go to full version