Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - kth5

Pages: [1]
1
General Hardware Discussion / KVM cross-endian guest on POWER8 / IBM S822L
« on: November 19, 2024, 02:08:49 am »
Hi everyone,

I recently acquired a IBM S822L dual socket system and now am starting to get to the point of using KVM/libvirt to bring guests to it. As I was used to on my Blackbird I could run big-endian guests without much performance penalty at all running little-endian on the host.

However, on the POWER8 it seems that no matter what I do, I get up to 50% steal time inside a VM when the bare-metal and the VM are otherwise completely idle. When I put any load on the VM it gets worse and jumps up to 80% or more making any vCPU config moot. This only happens when running a big-endian guest kernel and userland, which I need for my build-bots.

A bit of background of what I'm dealing with here:
* dual POWER8E 10 core SMT8 / 256GB RAM running Opal on latest available firmware
* no entitlements beyond basic & micro LPAR and the usual AIX
* bare-metal runs ArchPOWER ppc64le with SMT switched off as required
* guests I'm testing with run ArchPOWER ppc64 or ppc64+32bit userland
* kernel version is 6.11.9 on bare-metal & guests
* Qemu 9.0.3/9.1.1 w/ libvirt 10.8.0

I've tried the following all exhibiting the same steal time issue:
* single vCPU w/o threads
* single vCPU w/o threads pinned to a physical core & NUMA zone
* dual vCPU w/ 8 threads (SMT)
* dual vCPU w/ 8 threads pinned to a physical core & NUMA zone (SMT)

What I did not try is running actual PowerKVM. It seems too outdated for my liking and I'd rather have a recent package base as close to upstream as possible.

Running ppc64le in the guest does not show the problem at all and performs splendidly.

Bit at a loss here and I would rather I not yet call it a lost cause.  :(

2
Blackbird / Did my Blackbird just die on me?
« on: July 14, 2022, 12:32:56 pm »
The other day I was logged in from remote and the box just goes down. I could still reach the BMC and attempt to power it up but to no avail. No Hostboot output on serial (via BMC) or event logs on the BMC. Just plain nothing.

Once I got home I switched the box on manually via switch, the fans started running on full tilt as usual but after pretty much exactly 30s it switched off again, without leaving a trace as to why in the eventlog on the BMC.

Then, I went to remove all hardware but the CPU one by one with tries in between, same effect.

The only thing that looks weird obviously are repeating dmesg entries every few seconds on the BMC:

Code: [Select]
[ 1367.988668] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 26 (F20) for 1e780000.gpio:306
[ 1367.988711] Want SCU90[0x00000002]=0x1, got 0x0 from 0x063F0000
[ 1367.988731] Want SCU8C[0x00000200]=0x1, got 0x0 from 0x00000001
[ 1367.988746] Want SCU70[0x00200000]=0x1, got 0x0 from 0xF1105206
[ 1370.989477] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 26 (F20) for 1e780000.gpio:306
[ 1370.989520] Want SCU90[0x00000002]=0x1, got 0x0 from 0x063F0000
[ 1370.989538] Want SCU8C[0x00000200]=0x1, got 0x0 from 0x00000001
[ 1370.989548] Want SCU70[0x00200000]=0x1, got 0x0 from 0xF1105206
[ 1373.990267] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 26 (F20) for 1e780000.gpio:306
[ 1373.990311] Want SCU90[0x00000002]=0x1, got 0x0 from 0x063F0000
[ 1373.990330] Want SCU8C[0x00000200]=0x1, got 0x0 from 0x00000001
[ 1373.990342] Want SCU70[0x00200000]=0x1, got 0x0 from 0xF1105206

Do these mean anything or are we just talking verbosity?

I can upgrade PNOR etc from BMC without failure and read it back, so that's not it either.


Did my CPU just die and if so, how the hell can I confirm this before I set on another investment of hundreds of dollars? :(

Pages: [1]