Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - rheaplex

Pages: [1]
1
GPU Compute / Accelerators / Wx9100 on Talos II
« on: March 27, 2022, 05:54:18 pm »
Heya all.

(Thank you to MPC7500 for previous help on this.)

I have a Talos II with a WX9100 graphics card installed, direct from RaptorCS. The graphics card displays PetitBoot perfectly every time, and then almost always fails to come back up for the main OS.

I have tried many different things to fix this:
- Different kernel arguments? Nothing seems to help.
- Different distro and kernel versions? Nothing reliable.
- Different kernel page size? I thought that 5.5 with 4k worked well, until it didn't. Otherwise there was no correlation.
- Try nomodest for late kms? Passed nomodeset to OS kernel, faster boot, can’t lsmod amdgpu though.
- Fast reset causing problems? https://wiki.raptorcs.com/wiki/Enabling_Navi_10_On_Fedora_31#Disabling_fast-reset Followed guide, no change.
- PCI needs resetting? https://wiki.raptorcs.com/wiki/File:Pcie_hot_reset.sh Ran script with 0000:03:00.01 as the argument (the AMDGPU lspci entry). It jammed after printing “Removing 0000:03:00.01…”.
- USB hub causing issues? Removed during boot, no change.
- Electrical power load issues on CPUs loading main OS? Changed to two separate wall sockets first r the power supplies, no change.
- Bad DP/HDMI connector? Changed converter and cable, no change.
- No access to firmware? Try amdgpu as module, or compiled in w/firmware in image. Rebuilt initramfs after it looked like it failed during install (the Fedora post-install script didn’t like the Petitboot version strung being hex rather than decimal), it doesn’t boot.

I notice that the PetitBoot amdgpu module size is different from the main OS one in my current setup. Other than that I've got nothing.  :'(

uname -a:

Code: [Select]
Linux workstation 5.15.0-60.local.fc35.ppc64le #1 SMP Sat Mar 26 17:31:31 PDT 2022 ppc64le ppc64le ppc64le GNU/Linux
lspci:

Code: [Select]
000:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0000:01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge
0000:02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge
0000:03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon PRO WX 9100]
0000:03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]
0001:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0001:01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller (rev 11)
0002:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0004:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
0004:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe
0005:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0005:02:00.0 Multimedia video controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
0030:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0030:01:00.0 Audio device: Creative Labs EMU20k2 [Sound Blaster X-Fi Titanium Series] (rev 03)
0031:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0031:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
0032:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0033:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)

inxi:
Code: [Select]
System:
  Host: workstation Kernel: 5.15.0-60.local.fc35.ppc64le ppc64le bits: 64 Desktop: N/A
    Distro: Fedora release 35 (Thirty Five)
Machine:
  Type: PPC System: T2P9S01 REV 1.01 details: N/A
CPU:
  Info: 2x 18-core POWER9 altivec supported [MT MCP SMP] speed (MHz): avg: 2211
    min/max: 2166/3800
Graphics:
  Device-1: AMD Vega 10 XT [Radeon PRO WX 9100] driver: amdgpu v: kernel
  Device-2: ASPEED Graphics Family driver: N/A
  Display: server: X.org v: 1.20.14 driver: X: loaded: ati,radeon unloaded: fbdev,modesetting
    gpu: amdgpu
  Message: No GL data found on this system.
Network:
  Device-1: Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe driver: tg3
  Device-2: Broadcom NetXtreme BCM5719 Gigabit Ethernet PCIe driver: tg3
Drives:
  Local Storage: total: 1.36 TiB used: 320.2 GiB (22.9%)
Info:
  Processes: 1176 Uptime: 10m Memory: 125.41 GiB used: 2.73 GiB (2.2%) Init: systemd runlevel: 3
  Shell: Bash inxi: 3.3.13

dmesg output is attached. The interesting part seems to be after "[drm] amdgpu kernel modesetting enabled."

System config also attached.

Can anyone suggest a known good distro/kernel/firmware version/configuration that I can test against? Or is there anything I'm doing obviously wrong?

Thank you.

2
Talos II / Radeon Pro WX 9100 GPU on Any Kernel > 5.5
« on: March 18, 2022, 08:00:06 pm »
Heya everyone.

I have a Talos II with an AMD Radeon PRO WX 9100 (Vega 10 XT, connected to a monitor via DP to HDMI). I've tried several distros on it, but currently it has Debian Bookworm installed (I know Debian best but I would happily switch to another distro if needed).

Everything is fine if I build and use the 5.5 kernel from Raptor's git site - I'm typing this on Firefox LTS in Cinnamon/Xorg. I've set the boot console to be VGA in Petitboot and added console=tty0 to the grub command line. Otherwise there's nothing different or special about this setup.

If I try any other kernel, with 4k or 64k pages and any combinations of kernel boot arguments, the screen drops out some time after the kexek message from Petitboot. I can ssh in, and see that the amdgpu module is loaded, but if I try to start Xorg it complains about no matching screens (or devices, if I remove the xorg conf snippet that specifies amdgpu as the default - 5.5 doesn't need this, though).

I'm happy to use 5.5 for now but at some point I'll need to upgrade to a newer kernel.

Can anyone recommend another known-good combination of kernel and configuration for Vega 10?

Thank you.

- Rhea.

Pages: [1]