Software > Firmware
Messing with WOF Tables
cy384:
tl;dr I bought an unsupported CPU, which was mostly ok, and I tweaked some firmware to make it work properly
This will be a bit of a narrative, documenting it in case anyone else is ever in the same situation:
I saw an astonishingly cheap used POWER9 CPU on ebay and knew it was finally time to buy a Raptor Blackbird. Specifically, I now have a 02CY231, which is a 16 core, 160W part (not one of the chips that Raptor sells). I figured since the Blackbird is rated for 160W it should be fine, and it does work out of the box, except it would only hit 90W, which I assume leaves a lot of performance on the table (for the record, I believe my BB shipped with 2.00 firmware). I spotted a section in the boot log like this:
--- Code: --- 4.94593|================================================
4.96605|Error reported by fapi2 (0x3300) EID 0x90000566
4.98696| No WOF table match found
4.98697| ModuleId 0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
4.98698| ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
4.98699| UserData1 Number of cores : 0x00100002000000a0
4.98700| UserData2 WOF Power Mode (1=Nominal, 2=Turbo) : 0x000009c400000012
4.98700|------------------------------------------------
4.98701| Callout type : Procedure Callout
4.98702| Procedure : EPUB_PRC_HB_CODE
4.98702| Priority : SRCI_PRIORITY_HIGH
4.98703|------------------------------------------------
4.98704| Callout type : Hardware Callout
4.98705| Target : Physical:/Sys0/Node0/Proc0
4.98706| Deconfig State : NO_DECONFIG
4.98706| GARD Error Type : GARD_NULL
4.98707| Priority : SRCI_PRIORITY_MED
4.98707|------------------------------------------------
--- End code ---
Ok, seems suspicious, but what's a WOF table? Apparently, it's a CSV file, containing specifications of frequencies and voltages to manage the CPU, which gets compiled into the PNOR image (they're named something like "WOF_V7_4_2_SFORZA_16_160_2500_TM.csv"). What's PNOR? Early stage bootloader flash. Fortunately(?) this is all open source and can in theory be modified to support my CPU, so I've been messing with this every evening this week. Gotta love a long day at work messing with build systems followed by a long evening of messing with build systems.
The instructions on the wiki to build the firmware are basically solid, just replace "talos" with "blackbird" in the obvious places. One gotcha is that you definitely want to compile on an older distro, I ran Ubuntu 18.04 in a VM to do this. The other gotcha I ran into was this one ( https://forums.raptorcs.com/index.php/topic,241.0.html ) but as far as I can tell, you don't need to modify OpenBMC if you're just tweaking the WOF tables in the PNOR.
Anyway, I got the firmware building. I dug around in the files it downloads and found a Raptor repository called "blackbird-xml" which contains the WOF tables; sure enough, it didn't contain any for a 16 core 160W chip. I searched around and did find a repository on github ( https://github.com/open-power/WOF-Tables ) with a bunch more, so I made a copy of "blackbird-xml" and added all the new WOF tables. I changed the address of the repository in "machine-xml.mk" to point towards mine, and added the commit hash for my changes to the "blackbird_defconfig" file. I built and got a new error, like this:
--- Code: ---ERROR: PnorUtils::checkSpaceConstraints: Image provided (/home/cy384/blackbird-op-build/output/host/powerpc64le-buildroot-linux-gnu/sysroot/openpower_pnor_scratch//wofdata.bin.ecc) has size (6285312) which is greater than allocated space (3145728) for section=WOFDATA. Aborting! at /home/cy384/blackbird-op-build/output/host/powerpc64le-buildroot-linux-gnu/sysroot/hostboot_build_images/PnorUtils.pm line 462.
--- End code ---
I assume there's either a hard limit, or configured limit, on the size of the WOF table data in the PNOR, so I deleted all the WOF tables I didn't care about from my repository, updated the commit hash again, and it built successfully.
I followed the instructions on the wiki page to test out the new PNOR, and my BB booted without the WOF table error! I did some load testing and sensors does report power usage near 160W, so I'm calling this a success. The voltage regulators do get really spicy very quick, but that's a subject for another post.
MPC7500:
Great! It's a pity that there is no more user-friendly way to edit the WOF tables. This also applies to the fan curves.
MauryG5:
Possibly indeed your problem seems to stem from the fact that currently this Power9 model is not in the list of those directly supported by Raptor. In fact I did not know this particular 16 core model, I remembered that there is a 12 core model and then the classic 18 and 22 but I did not know anything about this 16. You can try to ask Raptor in any case if you still have problems, possibly they can tell you what you need to correct to make it work fully.
ejfluhr:
Can you read out the processor frequencies and tell if it is boosting properly? Do you see ~160W when at idle? What about when running max threads off some heavy workload (e.g. Mersenne primes is a good one)?
ejfluhr:
Too bad this isn't publicly available:
https://research.ibm.com/publications/deterministic-frequency-boost-and-voltage-enhancements-on-the-power10tm-processor
Abstract
Digital droop sensors with core throttling mitigate microprocessor voltage droops and enable a voltage control loop (undervolting) to offset loadline uplift plus noise effects, protecting reliability VDDMAX. These combine with a runtime algorithm for Workload Optimized Frequency (WOF) that deterministically maximizes core frequency. The combined effect is demonstrated across a range of workloads including SPECTM , and provides up to a 15% frequency boost and a 10% reduction in core voltage.
Navigation
[0] Message Index
[#] Next page
Go to full version