Linux 5.8 has upstream support for POWER9's Virtual Accelerator Switchboard (VAS), bringing initial support for the chip's gzip accelerators, as you may have heard about on
Talospace already. Of course I jumped right into trying to get it to work, which has been mostly fine but has hit a bump in the last mile.
First off, you need skiboot 6.6 to enable this userspace support in the first place, which you can build using
upstream op-build 2.5. Fortunately,
Stewart Smith provides builds for the Blackbird, which I was able to get going with no problems. You'll also need to set `vas-user-space=enable` in NVRAM, as described in
this skiboot commit.
Unfortunately I haven't gotten the kernel to cooperate yet—the VAS code complains about an "unexpected DT configuration" on boot. The offending condition (
here it is in context):
if (pdev->num_resources != 4) {
pr_err("Unexpected DT configuration for [%s, %d]\n",
pdev->name, vasid);
return -ENODEV;
}
I'm not familiar with the kernel platform device code, so I'm not sure how to tell how my DT setup differs from what it's expecting. My hunch is that there's a simple assumption somewhere that doesn't hold up, but I have no way to verify that. That said, the device tree itself looks appropriate:
# lsprop /proc/device-tree/vas@6019100000000/
ibm,chip-id 00000000
ibm,vas-port 00060100 00800000
interrupts 00000040 00000000
interrupt-parent 000000df (223)
compatible "ibm,power9-vas"
"ibm,vas"
reg 00060191 00000000 00000000 02000000
00060190 00000000 00000001 00000000
00080000 00000000 00000001 00000000
00000000 00000020 00000000 00000010
phandle 000000fa (250)
ibm,vas-id 00000000
name "vas"
The only other potential issue I'm aware of is that, nominally, CONFIG_PPC_VAS depends on 64k pages, and I'm running Void Linux's config which uses 4k pages. I've heard the dependency may be based on an assumption that is no longer true, so I removed it for testing. I would imagine any issues with that would manifest
after VAS is initialized, but it's worth noting.
If other kernel hackers can chime in with info, or if any of you have gotten this working on your own Raptor machines, I would appreciate the help! (Same goes if you're running a 64k-pagesize kernel and encounter this issue.) Hopefully we can sort this out and enjoy the purported performance described in
this document:
The gzip accelerator compresses and decompresses data at a rate of 9 to 16 GByte per second–depending on the processor model. One gzip compress accelerator throughput is equivalent to 70 to 120 cores, and one gzip decompress accelerator throughput is equivalent to 25 to 45 cores running software gzip/zlib/deflate implementations.
(That PDF is linked from
the wiki for the `power-gzip` zlib drop-in library. The
other document listed there looks amusingly like something internal, including descriptions of how to disable secure boot on Witherspoon and other IBM boards. Naturally, it's fairly outdated.)