1
GPU Compute / Accelerators / Wx9100 on Talos II
« on: March 27, 2022, 05:54:18 pm »
Heya all.
(Thank you to MPC7500 for previous help on this.)
I have a Talos II with a WX9100 graphics card installed, direct from RaptorCS. The graphics card displays PetitBoot perfectly every time, and then almost always fails to come back up for the main OS.
I have tried many different things to fix this:
- Different kernel arguments? Nothing seems to help.
- Different distro and kernel versions? Nothing reliable.
- Different kernel page size? I thought that 5.5 with 4k worked well, until it didn't. Otherwise there was no correlation.
- Try nomodest for late kms? Passed nomodeset to OS kernel, faster boot, can’t lsmod amdgpu though.
- Fast reset causing problems? https://wiki.raptorcs.com/wiki/Enabling_Navi_10_On_Fedora_31#Disabling_fast-reset Followed guide, no change.
- PCI needs resetting? https://wiki.raptorcs.com/wiki/File:Pcie_hot_reset.sh Ran script with 0000:03:00.01 as the argument (the AMDGPU lspci entry). It jammed after printing “Removing 0000:03:00.01…”.
- USB hub causing issues? Removed during boot, no change.
- Electrical power load issues on CPUs loading main OS? Changed to two separate wall sockets first r the power supplies, no change.
- Bad DP/HDMI connector? Changed converter and cable, no change.
- No access to firmware? Try amdgpu as module, or compiled in w/firmware in image. Rebuilt initramfs after it looked like it failed during install (the Fedora post-install script didn’t like the Petitboot version strung being hex rather than decimal), it doesn’t boot.
I notice that the PetitBoot amdgpu module size is different from the main OS one in my current setup. Other than that I've got nothing.
uname -a:
lspci:
inxi:
dmesg output is attached. The interesting part seems to be after "[drm] amdgpu kernel modesetting enabled."
System config also attached.
Can anyone suggest a known good distro/kernel/firmware version/configuration that I can test against? Or is there anything I'm doing obviously wrong?
Thank you.
(Thank you to MPC7500 for previous help on this.)
I have a Talos II with a WX9100 graphics card installed, direct from RaptorCS. The graphics card displays PetitBoot perfectly every time, and then almost always fails to come back up for the main OS.
I have tried many different things to fix this:
- Different kernel arguments? Nothing seems to help.
- Different distro and kernel versions? Nothing reliable.
- Different kernel page size? I thought that 5.5 with 4k worked well, until it didn't. Otherwise there was no correlation.
- Try nomodest for late kms? Passed nomodeset to OS kernel, faster boot, can’t lsmod amdgpu though.
- Fast reset causing problems? https://wiki.raptorcs.com/wiki/Enabling_Navi_10_On_Fedora_31#Disabling_fast-reset Followed guide, no change.
- PCI needs resetting? https://wiki.raptorcs.com/wiki/File:Pcie_hot_reset.sh Ran script with 0000:03:00.01 as the argument (the AMDGPU lspci entry). It jammed after printing “Removing 0000:03:00.01…”.
- USB hub causing issues? Removed during boot, no change.
- Electrical power load issues on CPUs loading main OS? Changed to two separate wall sockets first r the power supplies, no change.
- Bad DP/HDMI connector? Changed converter and cable, no change.
- No access to firmware? Try amdgpu as module, or compiled in w/firmware in image. Rebuilt initramfs after it looked like it failed during install (the Fedora post-install script didn’t like the Petitboot version strung being hex rather than decimal), it doesn’t boot.
I notice that the PetitBoot amdgpu module size is different from the main OS one in my current setup. Other than that I've got nothing.
uname -a:
Code: [Select]
Linux workstation 5.15.0-60.local.fc35.ppc64le #1 SMP Sat Mar 26 17:31:31 PDT 2022 ppc64le ppc64le ppc64le GNU/Linux
lspci:
Code: [Select]
000:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0000:01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge
0000:02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge
0000:03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100]
0000:03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]
0001:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0001:01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller (rev 11)
0002:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0004:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
0004:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
0005:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0005:02:00.0 Multimedia video controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
0030:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0030:01:00.0 Audio device: Creative Labs EMU20k2 [Sound Blaster X-Fi Titanium Series] (rev 03)
0031:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0031:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
0032:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0033:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
inxi:
Code: [Select]
System:
Host: workstation Kernel: 5.15.0-60.local.fc35.ppc64le ppc64le bits: 64 Desktop: N/A
Distro: Fedora release 35 (Thirty Five)
Machine:
Type: PPC System: T2P9S01 REV 1.01 details: N/A
CPU:
Info: 2x 18-core POWER9 altivec supported [MT MCP SMP] speed (MHz): avg: 2211
min/max: 2166/3800
Graphics:
Device-1: AMD Vega 10 XT [Radeon PRO WX 9100] driver: amdgpu v: kernel
Device-2: ASPEED Graphics Family driver: N/A
Display: server: X.org v: 1.20.14 driver: X: loaded: ati,radeon unloaded: fbdev,modesetting
gpu: amdgpu
Message: No GL data found on this system.
Network:
Device-1: Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe driver: tg3
Device-2: Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe driver: tg3
Drives:
Local Storage: total: 1.36 TiB used: 320.2 GiB (22.9%)
Info:
Processes: 1176 Uptime: 10m Memory: 125.41 GiB used: 2.73 GiB (2.2%) Init: systemd runlevel: 3
Shell: Bash inxi: 3.3.13
dmesg output is attached. The interesting part seems to be after "[drm] amdgpu kernel modesetting enabled."
System config also attached.
Can anyone suggest a known good distro/kernel/firmware version/configuration that I can test against? Or is there anything I'm doing obviously wrong?
Thank you.