Third Party Hardware > GPU Compute / Accelerators

Wx9100 on Talos II

(1/4) > >>

rheaplex:
Heya all.

(Thank you to MPC7500 for previous help on this.)

I have a Talos II with a WX9100 graphics card installed, direct from RaptorCS. The graphics card displays PetitBoot perfectly every time, and then almost always fails to come back up for the main OS.

I have tried many different things to fix this:
- Different kernel arguments? Nothing seems to help.
- Different distro and kernel versions? Nothing reliable.
- Different kernel page size? I thought that 5.5 with 4k worked well, until it didn't. Otherwise there was no correlation.
- Try nomodest for late kms? Passed nomodeset to OS kernel, faster boot, can’t lsmod amdgpu though.
- Fast reset causing problems? https://wiki.raptorcs.com/wiki/Enabling_Navi_10_On_Fedora_31#Disabling_fast-reset Followed guide, no change.
- PCI needs resetting? https://wiki.raptorcs.com/wiki/File:Pcie_hot_reset.sh Ran script with 0000:03:00.01 as the argument (the AMDGPU lspci entry). It jammed after printing “Removing 0000:03:00.01…”.
- USB hub causing issues? Removed during boot, no change.
- Electrical power load issues on CPUs loading main OS? Changed to two separate wall sockets first r the power supplies, no change.
- Bad DP/HDMI connector? Changed converter and cable, no change.
- No access to firmware? Try amdgpu as module, or compiled in w/firmware in image. Rebuilt initramfs after it looked like it failed during install (the Fedora post-install script didn’t like the Petitboot version strung being hex rather than decimal), it doesn’t boot.

I notice that the PetitBoot amdgpu module size is different from the main OS one in my current setup. Other than that I've got nothing.  :'(

uname -a:


--- Code: ---Linux workstation 5.15.0-60.local.fc35.ppc64le #1 SMP Sat Mar 26 17:31:31 PDT 2022 ppc64le ppc64le ppc64le GNU/Linux
--- End code ---

lspci:


--- Code: ---000:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0000:01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge
0000:02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge
0000:03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100]
0000:03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]
0001:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0001:01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller (rev 11)
0002:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0004:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
0004:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
0005:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0005:02:00.0 Multimedia video controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
0030:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0030:01:00.0 Audio device: Creative Labs EMU20k2 [Sound Blaster X-Fi Titanium Series] (rev 03)
0031:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0031:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
0032:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0033:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
--- End code ---

inxi:

--- Code: ---System:
  Host: workstation Kernel: 5.15.0-60.local.fc35.ppc64le ppc64le bits: 64 Desktop: N/A
    Distro: Fedora release 35 (Thirty Five)
Machine:
  Type: PPC System: T2P9S01 REV 1.01 details: N/A
CPU:
  Info: 2x 18-core POWER9 altivec supported [MT MCP SMP] speed (MHz): avg: 2211
    min/max: 2166/3800
Graphics:
  Device-1: AMD Vega 10 XT [Radeon PRO WX 9100] driver: amdgpu v: kernel
  Device-2: ASPEED Graphics Family driver: N/A
  Display: server: X.org v: 1.20.14 driver: X: loaded: ati,radeon unloaded: fbdev,modesetting
    gpu: amdgpu
  Message: No GL data found on this system.
Network:
  Device-1: Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe driver: tg3
  Device-2: Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe driver: tg3
Drives:
  Local Storage: total: 1.36 TiB used: 320.2 GiB (22.9%)
Info:
  Processes: 1176 Uptime: 10m Memory: 125.41 GiB used: 2.73 GiB (2.2%) Init: systemd runlevel: 3
  Shell: Bash inxi: 3.3.13
--- End code ---

dmesg output is attached. The interesting part seems to be after "[drm] amdgpu kernel modesetting enabled."

System config also attached.

Can anyone suggest a known good distro/kernel/firmware version/configuration that I can test against? Or is there anything I'm doing obviously wrong?

Thank you.

MPC7500:
I would use the AST GPU till Petitboot and then switch to the amdgpu.

xilinder:
@rheaplex

"The graphics card displays PetitBoot perfectly every time, and then almost always fails to come back up for the main OS."

If you switch OFF the main power and let the system go completely dead, does it boot differently (better) ?

ClassicHasClass:
Petitboot is almost certainly using an older kernel, too (which is usually the case).

What are your kernel boot arguments?

MPC7500:
If you're on Kernel 5.14+ you have to add amdgpu.aspm=0 to the Kernel commandline.

Navigation

[0] Message Index

[#] Next page

Go to full version