Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - bobpaul

Pages: [1] 2
1
Firmware / Re: Messing with WOF Tables
« on: March 17, 2025, 07:51:37 pm »
>That link doesn't seem to work, do you happen to still know how to navigate to it or know the name of the paper.

Sorry, I don't.  As I recall, it was an educational presentation/paper on how processor power is modeled.  It is pretty well known material, maybe you can find other references.

hmm, ok. I thought it might be more specific to these processors.

It's been a long time, but I believe VRATIO just means the number of ON cores (a.k.a. active) relative to the maximum available.  As the # of active cores goes down, the power from those cores is applied to the remaining cores, allowing them to boost to higher frequencies.   Depending on the system & power limit, the proc could pretty quickly apply so much power credit from offline cores that it flat-lines at the maximum possible frequency, i.e. it becomes technology limited not power limited.  It sounds like, for that table, the processor is only power-limited when >= 12 cores ON.    Technically VRATIO is "voltage ratio" intended to handle quads using the internal voltage regulator at some % below the input voltage, but it ended up not getting supported and devolved to tracking # of cores.   A core in a stopped state with the power headers off has voltage of 0v hance VRATIO=0 for that core, and you add up the # of ON vs OFF cores to get he VRATIO.

Ah-ha. So that's why VRATIO_INDEX is always 0-23 and VRATIO_STEP is always exactly 1/24 to 3 figures. VRATIO_STEP is always 0.0409 to account for rounding in VRATIO_STEP to ensure that 23*VRATIO_STEP + VRATIO_START == 1. So really, VRATIO_INDEX is the easier to follow number since it's just the number of active cores. And for a 16 core CPU, one should be able to ignore any VRATIO_INDEX > 15, as 0-15 cover 1-16 active cores, right?

I don't believe Linux ever implemented support for different quad frequencies so they all just run at the same frequency?   If true, WOF only uses FRATIO=1.0 indices.

Certainly in Linux I see different frequencies on a per-core basis, but I understood that to be simply what the kernel was requesting rather than what it's actually running at. I had thought it was the case where if 2 active cores in a quad had requested frequencies of 2200 and 1800 then the quad (and thus both cores) would run at 2200MHz. Linux knows that 4 threads are the same core, so those always show the same frequency, but I understood Linux didn't actually know about the quads and rather I thought it was the PNOR that was really in control of the quad frequencies. But I don't know of any way of checking what frequency a core (or quad) is actually running at.

All of the tables have FRATIO_STEP of 0.1, FRATIO_START of 1, and FRATIO of 0.6, 0.7, 0.8, 0.9, or 1.0. But... yeah, holding vratio constant and looking at fratio, it does not look like the fratio value has an impact, at least for this csv:



For reference, CORE_CEFF is the ratio of the workload switching power relative to TDP, where TDP=1.0.   So 0.5 means the workload has half the switching power of TDP.     If the workload is using less power, the WOF table should attempt to raise frequency, up to the maximum allowed, where it flatlines.   Similarly, if there are fewer cores active, the frequency will go up, and if both are true, the frequency will go up more.   So that explains the shape of that plot.

OK, I don't understand the name, but that makes more sense than my assumption that it was a physical property related to the switching capacitance.

2
GPU Compute / Accelerators / Re: Intel Arc Support in Kernel 6.8
« on: March 06, 2025, 10:16:19 am »
it does support the current Alchemist line; the only thing we'll likely miss out on is support for the HuC microcontroller. Without that support the video encoder is rather less capable...

Kernel docs only mention reference HEVC/H.265. The media driver mentions that either HuC or shader based encoding(which is hopefully still better than CPU based). Does encoding for h.264, VP9, and AV1 work that way?

I found this upstream bug tracking the issue. From the link in the last comment it looks like no accelerated decode on the A380 even with i915 driver? that can't be right...

3
Firmware / Re: What does it take to support a new CPU?
« on: March 04, 2025, 01:24:25 pm »
Alright I got the XML depedencies installed, I also had to install a static zlib to get it compiling. I misremembered the build failing with bitbake URLs earlier, it was actually buildroot that's failing to fetch stuff.

Code: [Select]
--2025-02-27 22:50:28--  http://sources.buildroot.net/ppe42-gcc/ppe42-gcc-84a6a88e95d3b52cf4a6979a5ca47a12daa6ec49-br1.tar.gz

2 things I notice. That revision (84a6a88e95d3b5) indicates you're building a v2.1 firmware or newer. The second thing I notice is ppe42-gcc should come come from one of raptor's gitlab repos, not from sources.buildroot.net. Did you make local changes or fail to git submodules update or switch revisions without clearing your `output/` folder? What does git status show? Which git revision of are you on? Also I think buildroot is sensitive to environment variables; if you're not using `bash` as your shell, it might not work.

Basically I would only expect it to try to fetch from sources.buildroot.net if the PPE42_GCC_SITE variable were messed with, and I doubt you edited openpower/package/ppe42-gcc/ppe42-gcc.mk and changed it. It seems something is just messed up in your build environment...



You don't have to check out my copy of raptor's firmware repo to do so, but I would recommend following my README instructions and using the Dockerfile.stretch (for building raptor-v2.00 based pnor) and Dockerfile.debian (for building raptor-v2.10 based pnor) which I added to my repo. I plan to update the wiki eventually (and probably add a page about using docker for the build environment). The page on troubleshooting/debugging hostboot could use some expansion as well.

Both bitbake and buildroot can be rather sensitive to the local environment. If you have conflicting environment variables (maybe using a shell other than bash) could cause it to do weird things. My docker files should behave the same as a properly set-up debian chroot, but since the Dockerfile acts as a script, the environment is always the same and resetting the environment is as simple as exiting and then running the specific `docker run` command again. At some point I'll probably get Fedora and Ubuntu based dockerfiles working

4
Firmware / Re: What does it take to support a new CPU?
« on: February 27, 2025, 10:50:15 am »

I don't get anything from hostboot's serial console. I just remembered that last time I tried, I got a bootup screen on the VGA, but nothing else. I'll try again without the VGA disable pin. This is what it looks like. It just boot loops 3 or 4 times and stays on. The loading bar loads a bit but gets stuck like a 1/6th of the way. Looks like this: https://files.catbox.moe/fzskrz.jpg


VGA disable jumper doesn't actually disable the VGA output during hostboot, so that shouldn't matter. Your jpg link is showing bad cert (default traefik certificate) and 404.


I've just started to build the PNOR without the debian chroot and it is coming along pretty well. I am going to leave it compiling overnight. I think it's complaining about some XML depedency?


I added a debian Dockerfile and instructions that worked for me.


Isn't the BMC firmware the same from System Package v2.00 and v2.10? BMC v2.10 and PNOR v2.00 is what ive usually got on and it works but complains about WOF tables.


Oh, you're right. I suppose I should downgrade my PNOR and see how my machine behaves. There weren't any WOF file changes between v2.00 and v2.10. Both use the same revision of the machine-xml repo (they renamed the repo, but same rev hash). EDIT I tried v2.00 on mine and it also works for me and I also still get the WOF complaints.

Code: [Select]
$ git checkout raptor-v2.10
$ git diff raptor-v2.00 openpower/configs/talos_defconfig

....

 BR2_HOSTBOOT_CONFIG_FILE="talos.config"
-BR2_OPENPOWER_MACHINE_XML_GITHUB_PROJECT_VALUE="talos-xml"
+BR2_HOSTBOOT_USE_ALTERNATE_GCC=y
+BR2_OPENPOWER_MACHINE_XML_GITHUB_PROJECT_VALUE="machine-talos-ii/machine-xml"
 BR2_OPENPOWER_MACHINE_XML_VERSION="cbd11e9450325378043069d7e638668ea26c2074"
 BR2_OPENPOWER_MACHINE_XML_FILENAME="talos.xml"

....


EDIT: Here's a re-build of pnor v2.00 but with the extra WOF tables for 16-core and below https://gitlab.com/bobpaul/talos-op-build/-/releases/wof-4-8-12-16-core_raptor-v2.00

5
Firmware / Re: Messing with WOF Tables
« on: February 26, 2025, 10:59:58 pm »
I copied that table CSV from GIT and filtered it to generate plots of frequency versus "CORE_CEFF" for a couple of different "VRATIO" values.
This link declares:  https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.905&rep=rep1&type=pdf

     Power ~= VDD2 x Fclk x Ceff
     where the effective switched capacitance, Ceff, is commonly expressed as the product of the physical capacitance CL, and the activity weighting factor α, each averaged over the N nodes.

That link doesn't seem to work, do you happen to still know how to navigate to it or know the name of the paper. I tried looking on doi.org using the number in the link, but I don't think that's a complete DOI number.

Trying to follow along and I'm not certain what most of the abbreviations mean, so the column titles are a bit nonsense. Fratio ... Frequency ratio? But ratio of what to what? And

But I did reproduce the graph in python, which should make it a bit easier to iteratively view a bunch of plots:

Code: [Select]
import pandas as pd
import matplotlib.pyplot as plt
wof = pd.read_csv('WOF_V7_4_2_SFORZA_16_160_2500_TM.csv')
print("Unique Vratio indexes are: ", wof['VRATIO_INDEX'].unique())

# filter the table down to a plotable slice
def plotme(wf, vratio_index, nest_ceff=0.25, active_quads=6, fratio=1, plotit=True):
    vratio = wf['VRATIO_START'][0] + vratio_index * wf['VRATIO_STEP'][0]
    wf=wf[
          (wf['FRATIO'] == fratio) &
          (wf['ACTIVE_QUADS']==active_quads) &
          (wf['NEST_CEFF']==nest_ceff) &
          (wf['VRATIO_INDEX']==vratio_index) &
          (wf['VRATIO']==vratio)
          ]
    wf[['WOF_FREQ','CORE_CEFF']].plot(x='CORE_CEFF',
            title=f'Fratio={fratio}, Vratio={vratio}, ACTIVE_QUADS={active_quads}, NEST_CEFF={nest_ceff}')
    # plt.show() blocks the thread. If you don't call it, you can render several plots in the background and plt.show() them at once all later
    if plotit:
        plt.show()

plotme(wof, 12)



this works well pasted into an ipython prompt. I see most values for Vratio_index just plot a flat line, 12 and a couple of values near 12 make a nice graph. But why 12? And somewhat surprisingly changing the number of active quads doesn't seem to have any effect. Also what defines a quad as active? Simply that it's not powergated and completely off or does "active" mean high load?

And I think the number 1 question is "how does one know which CSV gets selected for a given CPU"?

Edit
Plotting all Vratio values as a 3d surface plot is interesting:

Code: [Select]
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D

def plotsurf(wf, file="", nest_ceff=0.25, active_quads=6, fratio=1, showplot=True, use_index=False):
    # the VRATIO_START column is always the same value, as far as I've noticed
    wf=wf[
          (wf['FRATIO'] == fratio) &
          (wf['ACTIVE_QUADS']==active_quads) &
          (wf['NEST_CEFF']==nest_ceff)
          ]
    vratio = wf['VRATIO_INDEX'] if use_index else wf['VRATIO']

    ax = plt.figure().add_subplot(projection='3d')
    ax.plot_trisurf(wf['CORE_CEFF'], vratio, wf['WOF_FREQ'], cmap=cm.jet, linewidth=0.2)
    ax.set_title(f'{file}\nFratio={fratio}, ACTIVE_QUADS={active_quads}, NEST_CEFF={nest_ceff}')
    ax.set_xlabel('Core_Ceff')
    ax.set_ylabel('Vratio')
    ax.set_zlabel('WOF_Hz')
    ax.view_init(15, 60, 0)
    if showplot:
        plt.show()
    return ax   

(EDIT: graphs should show vratio from 0-1. I fixed a bug in the above code which caused the values greater than 1 shown, but I didn't take new screenshots...



and I see from the file in hostboot where the error is generated that

Code: [Select]
  4.98699|  UserData1  Number of cores : 0x00100002000000a0
  4.98700|  UserData2  WOF Power Mode (1=Nominal, 2=Turbo) : 0x000009c400000012

means 12core, "mode=2" (turbo?) 160w, 2500MHz, with header().size=12. Looking at the SFORZA list, that's a 1800-2500MHz part with 3.8GHz turbo. Intuitively it' makes sense that table matches that CPU. But, also looking at the SFORZA list, every CPU on the list supports turbo mode to at least 3.8GHz, sometimes 4.1GHz.

So why are so many of the tables suffixed with _NM.csv? There's a WOF_V7_4_2_SFORZA_16_140_2200_NM.csv, but I don't see any 140w 16core SFORZAs on the list. And plotting it the same as the other, it still goes to 3.8GHz despite the "2200MHz normal mode" file label.  ???

6
Firmware / Re: What does it take to support a new CPU?
« on: February 26, 2025, 12:34:57 pm »
If the issue with the 20CY230 is strictly a WOF table issue, then try this: https://gitlab.com/bobpaul/talos-op-build/-/releases/wof-16-20250226

This has all of the upstream WOF tables for 16 core and below. When there were conflicts between upstream and what Raptor had, I kept Raptor's.

But my understanding (not sure if correct) is SFORZA cpus should work without WOF they just won't turboboost.

Not sure if you saw, but you can test the PNOR without flashing it.

Edit I just noticed I'm also including a previously unreleased change that bumps the "X frequency to 2GHz". From the commit date I had assumed that was already in v2.10, but it's not. If this doesn't work I can exclude that change. I can also try building a v2.00 pnor that has the extra WOF tables.

7
Firmware / Re: What does it take to support a new CPU?
« on: February 26, 2025, 07:33:27 am »



I've tried booting it on several firmware versions with the 20CY230 but it just doesn't IPL or give me any info from the PNOR serial interface or from the BMC.

When you say it doesn't IPL, does absolutely nothing come up on the console, or do you at least get some output from Hostboot?

There are a 4 WOF csv files meant for 16core SFORZA over on https://github.com/open-power/WOF-Tables which aren't included in Raptor's PNOR, but if you're not getting any Hostboot output at all, I think it's more than just missing WOF tables. But I can build a PNOR that simply includes all of the 16core WOF tables.

The 02AA883 boots with anything except the latest firmware, and on the second-latest firmware (v2.00) it complains about not having WOF tables in the PNOR serial's dmesg.

Try using the PNOR from v2.00 and upgrading the BMC to v2.10. My guess is the BMC firmware is unrelated to your issue.

I don't think we're quit there yet, but here's the diff between v2.00 and v2.10 pnor. There's changes to talos_config and a talos_defconfig that we'd probably want to start with.

I might try to set up some sort of archive for a lot of this talos repo stuff just in case it all becomes abandonware. Currently it's not building because it fails to fetch repositories from bitbake, dead links I suppose.

Easiest way to do this is clone the repo and fix any broken URLs and get it building. After it's built, commit any changes you made (hopefully only be URLs) and then save your bitbake download cache. If bitbake sees a file in the download cache that matches a download, it does checksum verify and if the checksum is good it doesn't try to fetch from the URL. This unfortunately means project developers often don't notice when a dependency URL has changed until a new dev joins the team and tries a fresh checkout.

I haven't upgraded the FPGA with the PNOR/BMC flash so that may be causing it, looking at FPGA changes there are some minor things related to Talos II lite. I've got a flash programmer here I just dont want to brick the FPGA.

Check your motherboard revision. If it's HW rev1.00 then don't upgrade past FPGA v1.07. If it's HW rev 1.01 then definitely upgrade it but don't downgrade it.

8
Firmware / Re: What does it take to support a new CPU?
« on: February 25, 2025, 08:17:42 pm »
I'm currently going through this process, too, except I'm hoping to get the 02AA883 working slightly better. Mine boots fine on the latest firmware, but it doesn't load the WOF table.

Have you tried booting the 02CY230 yet?


9
Firmware / Re: Updating PNOR/BMC firmware on talos II Lite
« on: February 25, 2025, 07:29:33 pm »
I built one of the v1.xx BMC firmwares about a year ago. I might still have notes on that, but it involved switching a lot of the repo urls in bitbake packages. All the source is still online, just not at the same URLs.

You can find Raptor's Repos on


and at some point in time they moved some repos from one server to another.

Quote
Maybe this is because it's a special developer system with DD2.1-stepped CPU?

Do you know what part number your CPU is?

Edit: Oh, I see from your other thread and IRC that you it's the same 02AA883 CPU I have. I already mentioned on IRC, but I'm going to repeat here just so it's searchable by others: aside from "just" being DD2.1 stepping, this CPU uses unpaired cores. At least for adaptl and myself, all 4 of the active cores are on the same quad even (cores 0x14, 0x15, 0x16, and 0x17 are the active cores). All of the production (DD2.2+) 4 core CPUs were made for RaptorCS and only made from dies that had 4 usable cores which weren't paired (so ie, if core 0x14 and 0x15 were good, only one was used). The big ramification is L2 and L3 cache sharing which would worsen the spectre/meltdown concerns.

Here's the core layout in the Sforza CPUs (pg37). (core 0x00 from linux's perspective has address 0x20 on the diagram)


I wasn't sure if all of the DD2.1 special developer systems were the same or if some had unpaired cores.

10
Operating Systems and Porting / Re: Linux 5.8 and nx-gzip
« on: February 25, 2025, 09:55:38 am »
Did you ever get this building? I just stumbled across the vas-user-space=enable option while searching the wiki for something else.

I guess where this would be most interesting would be support in filesystems that support compression (btrfs, zfs, xfs, etc). Normally gz would be poor choice for anything but archival data.

11
General CPU Discussion / Re: Performance of HPT vs Radix?
« on: February 24, 2025, 04:19:24 pm »
Oh, awesome! I also found this paper which cites the one you shared:
http://www.cs.cmu.edu/~dskarlat/publications/meecpt_hpca23.pdf

I guess I have some reading to do

12
General CPU Discussion / Performance of HPT vs Radix?
« on: February 21, 2025, 11:46:58 am »
I know the Radix MMU is one of the newewer features on the Power9 series vs older hardware. And I know that which MMU you use impacts KVM modules are available when running qemu. It's also my understanding that radix trees are commonly used for the page table on other architectures.

But I've been struggling to find much information on the benefits/tradeoffs. Has anyone posted benchmarks demonstrating the differences? What would be the worst case scenario for HPT? (I assume memory is very full, as that could result in more hash collisions). Are there scenarios where HPT would theoretically perform better than Radix?

13
I was searching for this recently; some of the software I use on the amd64 homeserver I want to retire runs on dotnet-core 6. Microsoft still doesn't have downloads for ppc64le but the project repo does show support for ppc64le in CentOS and RHEL (but nowhere else). (The same is true in dotnet-core 9).

Has anyone here tried building from source?

14
Talos II / Re: Trying to overclock, what is raptor-aggressive
« on: December 19, 2024, 10:25:39 am »
I have the same understanding of "folding".

My Talos II has dual 8-core. The self reported power usage (via sensors) shows about 100w (~32 per chip, an extra 3w for each Vdd, 9w for each Vdn, and 12w for the single PPT). At the wall I get 140-150w, but I'm not sure how much is the radeon graphics and the HDD array. I should probably unplug everything to do a check with that.

Linux calls the lowest power state "snooze" which I would guess is the same as what IBM is calling "nap", but it's hard to tell for sure.

That seems a reasonable interpretation. From reading the power management section of the Processor Manual, I understand that the firmware on CMEs is responsible for actually changing the frequency and voltages. Page 314 describes the "stop states". Levels 4 and higher are "reserved for the hypervisor" (which I would read as the the "hypervisor" has to request these states). Level 8-11 are the quad-level states that can power down L2 and L3 caches. Levels 4-7 sound a lot like C1e and C3 on an x86-64 system. Levels 8-11 sound a lot like C3-C7 on x86 systems.

The IBM document (EnergyScale for POWER9) that ejfluhr found really makes it sound like disabling cores is enough, but from my testing, it really doesn't seem to make any difference. Either this is already at the minimum, or more work is needed either in the kernel or the firmware.

cpupower idle-states shows idle states of snooze, stop0_lite, stop0, stop1, stop2, stop4, and stop5. When my system is idle, the duration counters for stop4, stop5, stop0_lite, and snooze are all increasing. When I use cpupower --cpu 0,1,2,... idle-states (and provide only a list of offline cores) it just says all of the cores I listed are offline.

The hypervisor stuff still confuses me a bit.

15
I'm trying to figure out the best way to handle my Talos II lite. I got it with a single slot NVME card and an old Radeon graphics card.

So here's what I'm thinking right now:


I haven't yet pulled the trigger. My concern is that the NVME to SATA adapter + sata cables will be too tall and I won't be able to plug anything into the x16 slot right below it.  :(

So maybe instead of the NVME-SATA adapter, perhaps an NVME to PCIE riser is better. I could flip a standard PCIe SATA controller upside down and plug it into the riser using one of the backplane connectors in front of an unpopulated PCIe slot.

With 2 of the NVME-riser adapters maybe I could put a SATA adapter, sound card, dual nvme, and a decent graphics card in  the Talos II lite.



I really hope this gets better on the Power10 products. Only 2 PCIe slots (neither supporting burfaction) on the Talos II lite and Blackbird does present a big challenge. The Blackbird at least has onboard audio and SATA.

Alternating the ports on the Talos II standard also would have been nice. Then on the Talos II Lite we could have had a gap between the x8 and the x16 slots rather than having both cards right next to each other and lots of deadspace at the bottom of the board.

On the Talos II standard is that all of the PCIe x16 slots are next to another slot. It's extremely common for graphics cards to cover 2 slots. It would be nice if the bottom slot was an x16 slot instead of an x8 slot and if those board-edge connectors were moved somewhere else. Then one could install a 2-slot x16 graphics card without losing any connectors. Right now one either loses access to an x8 slot or an x16 slot if they install a big graphics cards.


Pages: [1] 2