Raptor Computing Systems Community Forums (BETA)

Third Party Hardware => GPU Compute / Accelerators => Topic started by: Borley on July 10, 2021, 06:38:14 pm

Title: AMD rendering issues with Debian Bullseye
Post by: Borley on July 10, 2021, 06:38:14 pm
I have been struggling with getting a graphical session (Gnome, or Sway) running under Debian Bullseye. The system is a Blackbird with RX560. After installing firmware-amd-graphics, disabling the AST and setting amdgpu in initramfs modules, the system gets display but crashes out of Gnome GUI.

I have tried setting a default GPU per select desired GPU at runtime. (https://wiki.raptorcs.com/wiki/Troubleshooting/GPU#Workaround_2:_Select_desired_GPU_at_runtime) And disabled GLAMOR with the relevant portion of this (https://wiki.raptorcs.com/wiki/Troubleshooting/GPU#Xorg_crashes_or_is_laggy_with_the_AST_VGA_GPU) since it was also erroring about GL_OUT_OF_MEMORY for glamor.

This allowed GDM to now start, however text fails to render and login fails, always returning to the GDM login prompt. Aside from the text, everything looks okay in the GUI so I am confident that the GPU is working at least. It looks like the screen needs to be selected, according to xorg logs:

[     3.162] (==) No Layout section.  Using the first Screen section.
[     3.162] (**) |-->Screen "Screen0" (0)
[     3.162] (**) |   |-->Monitor "<default monitor>"
[     3.163] (**) |   |-->Device "GPU1"
[     3.163] (**) |   |-->GPUDevice "GPU1"
[     3.163] (==) No monitor specified for screen "Screen0".
        All GPUs supported by the amdgpu kernel driver
[     3.175] (EE) No devices detected.
[     3.176] (EE)
Fatal server error:
[     3.176] (EE) no screens found(EE)
[     3.176] (EE)

Has anybody gotten Bullseye up and running with an AMD card? And how do you determine the correct screen to select?
Title: Re: AMD rendering issues with Debian Bullseye
Post by: Borley on July 11, 2021, 05:17:31 pm
I have reverted all my changes on /usr/share/X11/xorg.conf.d/00-noglamoregl.conf and /etc/X11/xorg.conf.d/21-gpu-driver.conf, and found that the system still boots to gdm3 with no text rendering.
I may have misattributed the cause.

syslog shows errors for gnome-session and gdm:

gnome-session-binary[]: WARNING: Failed to upload environment to systemd: GDBus.Error: org.freedesktop.DBus.Error.NameHasNoOwner: Name "org.freedesktop.systemd1" does not exist
gnome-session-binary[]: WARNING: Falling back to non-systemd startup procedure due to error: org.freedesktop.DBus.Error.NameHasNoOwner: Name "org.freedesktop.systemd1" does not exist

I have made sure that systemd-logind is running
Title: Re: AMD rendering issues with Debian Bullseye
Post by: MPC7500 on July 11, 2021, 08:35:40 pm
Does this error occur only with Bullseye or also with other distributions? Your Xorg.config file is correct?

Depending on which distribution I use, I have to use these kernel boot arguments:
modprobe.blacklist=ast video=offb:off (https://wiki.raptorcs.com/wiki/Troubleshooting/GPU#I_want_Petitboot_via_AST_but_the_subsequent_Linux_OS_console_on_a_discrete_GPU)

And I have to select the port of the AMD GPU after Petitboot is loaded, otherwise I also get the message "no screens found".
Title: Re: AMD rendering issues with Debian Bullseye
Post by: Borley on July 13, 2021, 05:37:42 pm
Have not yet tried non-Debian, but I did just try a clean install with Debian Sid instead. I made no changes other than the following:

Code: [Select]
sudo update-initramfs -u
Code: [Select]
sudo update-grub
Code: [Select]
sudo apt install gnome-session
After restarting, the same issue occurrs, arriving at the login screen: (https://forums.raptorcs.com/index.php?action=dlattach;topic=292.0;attach=316;image)

Login fails and seems to restart gdm3. Same with gnome desktop from Tasksel, or gnome-core meta package. So I highly suspect this is a Gnome issue rather than a AMD GPU or Raptor Blackbird issue.
Title: Re: AMD rendering issues with Debian Bullseye
Post by: ClassicHasClass on July 13, 2021, 10:38:35 pm
I've found I have fewer issues if I don't use gdm at all. Can you do a text boot and startx from there?
Title: Re: AMD rendering issues with Debian Bullseye
Post by: Borley on July 18, 2021, 04:32:59 pm
Tried lxde + lightdm

The greeter runs okay and can actually log into an LXDE session. However, the background is black and window opened are not rendered properly. Moving them around produces this effect reminiscent of Windows XP:
(https://i.kym-cdn.com/entries/icons/facebook/000/030/063/solitaire.jpg)
(The black is erased revealing the actual Debian background and unrendered parts of open windows)

Tried lightdm + gnome
Can at least select Xorg or standard wayland gnome, same issue with failing to open the gnome session.

Debian 10 Buster
Lastly, I tried installing Buster again using the exact same ISO used when I originally set things up back in 2019.

Superblock last mount time is in the future
by less than a day, probably due to the hardware clock being incorrectly set.
Loads first part of gdm (black screen + cursor) and then exits to systemd(?) output:
([ OK ] ...
stops at [ OK ] Started GNOME Dsiplay Manager.

Interestingly, while first trying to boot the Debian 10 installer, and once in petitboot shell, I discovered "^[[25~" constantly being inputted as though a key is being held down. I tried different keyboards and it seems to occur regardless. Interesting since X11 logs had been complaining about keyboard input issues on the previous above ebian 11 installs.

Noticed in IPL startup output
Failed loading ucode lid 0x203d1

Failed BOOTKERNEL verification

I don't actually remember if this used to appear before I reflashed the PNOR, as I'd always disabled VGA for a discrete GPU.

tl;dr
I suspect that this issue derives from having flashed the PNOR earlier to correct a failed petitboot password change. None of these things were occurring until after I did that. I don't know nearly as much as some other Talos and Blackbird users, but I wonder if this could be from mismatched BMC and PNOR firmware versioning.
I recall my board arriving with an insert about how this is from a batch with "1.01" but am uncertain if that meant a hardware revision or the installed firmware revision. The only available pre-built PNOR image is 1.00. I know that Raptor advise against flashing BMC firmware during these supply shortages but I'm kind of backed into a corner here, and the system really isn't usable as it currently stands.
Title: Re: AMD rendering issues with Debian Bullseye
Post by: Borley on July 24, 2021, 02:35:51 pm
7/24

Updated the BMC and PNOR to v2.00 firmware, following the correct (https://wiki.raptorcs.com/wiki/File:Blackbird-openbmc-v2.00-bundle.tar) information (https://forums.raptorcs.com/index.php/topic,68.msg715.html#msg715).

It seems to have taken, but now USB boot media can no longer be booted. It just hangs with a Sigterm event. I also see that the BMC firmware now shows as 0.00.0000 for some reason.

To be quite honest, I am getting pretty tired of fighting with this thing since June. I'm going to try a few more things but am prepared to begrudgingly shelve this hardware.
Title: Re: AMD rendering issues with Debian Bullseye
Post by: ClassicHasClass on July 24, 2021, 11:27:01 pm
What happens if you try to mount the media in the Petitboot shell? Do you see files?
Title: Re: AMD rendering issues with Debian Bullseye
Post by: MPC7500 on July 25, 2021, 12:44:38 pm
I can well understand your displeasure. My first contact with Linux was with my Blackbird. At the beginning it was really hard to understand everything. But with time it gets better.

I didn't update to firmware 2.0 because it has too many disadvantages for me.
But IIRC, you have to decompress the image-bmc twice.

Also a good howto has been written by ClassicHasClass: https://www.talospace.com/2020/02/messing-with-new-20-bmc.html

I'm pretty sure the GPU issues has nothing to do with the BMC firmware.

Could or have you already posted your xorg config file?
And how is your computer connected to the display?
Do you use the AST GPU till Petitboot and then you switch to the AMD GPU or do you only use the AMD GPU?
Title: Re: AMD rendering issues with Debian Bullseye
Post by: Borley on August 01, 2021, 03:18:31 pm
The install media has data, I've provisioned it on different flash drives just to be sure it wasn't the drive. Options are found for Rescue mode, Automated Install, Default Install etc.

The VGA disable jumper is out of the system when I select to boot an install option,
Code: [Select]
SIGTERM received, booting...
_
And it hangs on that output. I see the USB flash drive activity LED react for a few seconds, so I believe it is booting but I just have no way of seeing the output. I switch the HDMI cable over to the dedicated GPU just in case, but nothing. In the past I only ever used the integrated HDMI during OS install or when any boot setting needed to be changed.

When I reach the new web control panel over BMC, it states Open Power/PNOR firmware version as open-power-blackbird-v2.3-rc2-84-g7cbe7a8d, and below that BMC as 2.7.0-dev-581-g18878e4f6. I wonder if I accidentally grabbed the development branch firmware. Pretty sure it was just whatever was linked on https://wiki.raptorcs.com/wiki/Blackbird/Firmware . Shouldn't they both be either 2.7 or 2.3 without mixing releases?
Title: Re: AMD rendering issues with Debian Bullseye
Post by: ClassicHasClass on August 04, 2021, 10:46:18 pm
Do you see anything on the serial port, either connecting directly to it, or from the web BMC interface?
Title: Re: AMD rendering issues with Debian Bullseye
Post by: Borley on December 02, 2021, 02:41:28 pm
After taking another shot at this, I have found that it boots okay to a Void linux live USB environment. I haven't tried with a dedicated GPU since I'm not sure if Void has amd gpu firmware packages.

Debian does not have live environments for any power architectures so I'm still SOL on with Debian. I thought maybe if I left on that prompt for long enough it would eventually start their whiptail installer. I'll see if it hangs on anything from serial.
Title: Re: AMD rendering issues with Debian Bullseye
Post by: MPC7500 on December 02, 2021, 05:05:50 pm
Void has the firmware package. IMO, its by far the best distribution for ppc64le.
Title: Re: AMD rendering issues with Debian Bullseye
Post by: Borley on January 14, 2022, 10:02:59 pm
After coordinating with support a bit, it turns out that Debian does not enable the ast kernel module out-of-the-box and needs to be added in manually. With that cleared up, I can begin to investigate this original issue further. Unfortunately, upgrading to 2.00 firmware set has not changed anything for gdm crashing at startup.
Title: Re: AMD rendering issues with Debian Bullseye
Post by: ClassicHasClass on January 17, 2022, 08:18:11 pm
I'll say in all honesty I've had other issues with gdm and I just don't use it anymore on Fedora (strict text boot, manually start Xorg). Are you Xorg or Wayland?
Title: Re: AMD rendering issues with Debian Bullseye
Post by: Borley on January 19, 2022, 03:23:35 pm
I'll say in all honesty I've had other issues with gdm and I just don't use it anymore on Fedora (strict text boot, manually start Xorg). Are you Xorg or Wayland?

I just tested Fedora 35 workstation today.


I powered down, placed the VGA disable jumper and tried the GPU.


I find it very convenient that Fedora includes the amdgpu firmware OOB. After testing this I can infer that the issue is not the GPU, video cable or cable type, board/firmware or conflict with other parts/peripherals. In fact, Debian has been a pretty poor showing on PPC lately.


The OS specific wiki page (https://wiki.raptorcs.com/wiki/Operating_System_Specific_Workarounds/Debian) basically exists because of me. Sorry, RCS support. Can I send you some pizza?
Title: Re: AMD rendering issues with Debian Bullseye
Post by: ClassicHasClass on January 20, 2022, 11:14:24 pm
Disappointing there's that little interest in fixing what seems to be largely a packaging problem.