Raptor Computing Systems Community Forums (BETA)

Third Party Hardware => GPU Compute / Accelerators => Topic started by: mcarden on November 07, 2021, 10:16:19 pm

Title: Blackbird Radeon RX580
Post by: mcarden on November 07, 2021, 10:16:19 pm
A couple of years ago I assembled a Blackbird into a case with a Radeon RX580 GPU, installed a version of Fedora Linux and I don't recall having to do anything special to make it work.

This week I dropped the Radeon back into the Blackbird which now runs Fedora 35 Workstation, and it doesn't produce video.

uname -r
5.14.16-301.fc35.ppc64le


The card is seen along with the aspeed one:

lspci | grep VGA
0000:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7)
0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)


And the kernel is loading modules for it:

lsmod | grep -i amdgpu
amdgpu               8257536  1
drm_ttm_helper        262144  3 drm_vram_helper,ast,amdgpu
ttm                   327680  3 drm_vram_helper,amdgpu,drm_ttm_helper
mfd_core              327680  1 amdgpu
gpu_sched             327680  1 amdgpu
i2c_algo_bit          262144  2 ast,amdgpu
drm_kms_helper        524288  5 drm_vram_helper,ast,amdgpu
drm                   851968  10 gpu_sched,drm_kms_helper,drm_vram_helper,ast,amdgpu,drm_ttm_helper,ttm
i2c_core              327680  8 drm_kms_helper,i2c_algo_bit,at24,ast,amdgpu,i2c_opal,regmap_i2c,drm


The only clue I have found so far comes from dmesg, which says in part:

[  121.619204] amdgpu 0000:01:00.0: [drm] Cannot find any crtc or sizes
[  133.894384] amdgpu 0000:01:00.0: refused to change power state from D0 to D3hot


I don't recall seeing this in the past (though I may have and have just forgotten) but I wonder if someone could point me in the direction of what I'm missing to get this going?

Thanks,
MC

Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 08, 2021, 12:54:21 am
Hi, unfortunately it is the same problem that I have pointed out since the Linux Kernel moved to version 5.14.X.  From this Kernel version onwards, at the moment, AMD GPUs no longer boot and you need a Kernel up to 5.13.19 to make it work.  They showed me a post where they understood the problem but I don't know how to do that procedure they talk about to avoid this ...
Title: Re: Blackbird Radeon RX580
Post by: ClassicHasClass on November 08, 2021, 12:38:16 pm
What are your kernel command line options? According to https://bugzilla.kernel.org/show_bug.cgi?id=200695 some people have had trouble with amdgpu.dc=1 (setting it to =0 seemed to fix it), though that problem seems to be somewhat old. There are a lot of comparable WX7100 OpenPOWER systems out there which should be the same generation, so I would expect this to be a more widespread problem (but I haven't upgraded my workstation to 35 yet either, planning to do the BMC-only Blackbird this week).
Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 08, 2021, 02:21:15 pm
Hi Classic, the problem I know for sure occurs with Kernels from version 5.14 onwards, including the new 5.15 which I just tested a few days ago and which still suffers from the same bug.  Can you tell me where to go or how to proceed to verify this part you are talking about?  Thanks
Title: Re: Blackbird Radeon RX580
Post by: mcarden on November 08, 2021, 02:56:51 pm
@ClassicHasClass, I had tried adding 'amdgpu.dc=0' to the kernel command line on boot but it didn't help.

--
MC
Title: Re: Blackbird Radeon RX580
Post by: ClassicHasClass on November 08, 2021, 03:22:32 pm
@mcarden, that's distressing because that card should be "known working." I wonder if @tle has already updated. My F34 T2 has a WX7100 and it works fine, but I haven't updated it to F35. Do you get at least Petitboot on screen? Can you force fbdev for Xorg?

MauryG5, I think you have a different issue because you're trying to use a Navi-based card, as memory serves. These are Polaris and should already be working just fine.
Title: Re: Blackbird Radeon RX580
Post by: sharkcz on November 08, 2021, 03:32:10 pm
For the record, you might hit https://gitlab.freedesktop.org/drm/amd/-/issues/1736 on a Polaris card starting with 5.15

If you know the version that worked and what does not, then you can bisect. In this case the iteration times would be short, mine was 1 iteration in a day for the bug above ... If you want a "minimized" kernel config derived from the Fedora one which reduces the compile times significantly, let me know. If you need help about bisecting, I can provide that too :-)
Title: Re: Blackbird Radeon RX580
Post by: mcarden on November 08, 2021, 03:59:18 pm
@ClassicHasClass, no output during petitboot, but the aspeed's HDMI does. The card's backlight LEDs for its logo *do* light during petitboot but then go out on boot. I seem to recall they used to stay on when the card was working.F35 is Wayland without the xorg option that earlier Fedoras had.

@sharkcz, I'm really, really hoping to not have to go down any sort of a kernel compiling rabbit Hole.

--
MC
Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 08, 2021, 04:45:50 pm
Yes Classic I know that I have the Navi, only that I say this because I have noticed that every time there is a problem in the Kernel, concerning the AMD GPUs, this occurs indiscriminately on different types of cards, such as Navi, Nano and others ... I therefore believe that the problem is the same also because coincidentally we have the same problems at the same time ... Maybe you have not read this link sent by the friend MPC, here we are talking about Vega, so another model but problem itself. Here they say they have solved but I did not understand anything about how they did unfortunately. If you can better understand what they did then it's great ... I'll post the link ...

https://gitlab.freedesktop.org/drm/amd/-/issues/1723
Title: Re: Blackbird Radeon RX580
Post by: mcarden on November 08, 2021, 05:46:00 pm
Progress, of a sort.

Adding 'amdgpu.aspm=0' to the kernel parameters at boot as mentioned at https://gitlab.freedesktop.org/drm/amd/-/issues/1723  results in the card producing video during boot (the Fedora logo at the bottom of the screen and a spinner) but as soon as boot reaches the login screen, video disappears and is only available via the onboard HDMI.

I got all excited seeing video there for a few moments...

--
MC
Title: Re: Blackbird Radeon RX580
Post by: ClassicHasClass on November 08, 2021, 06:32:29 pm
If you disable gdm (easiest way would be something like `systemctl set-default multi-user.target` or add `systemd.unit=multi-user.target`), do you at least get a text boot? You could try messing with additional options from there. You should still be able to install xorg in F35 even if it didn't come with it.
Title: Re: Blackbird Radeon RX580
Post by: ClassicHasClass on November 08, 2021, 06:35:56 pm
Yes Classic I know that I have the Navi, only that I say this because I have noticed that every time there is a problem in the Kernel, concerning the AMD GPUs, this occurs indiscriminately on different types of cards, such as Navi, Nano and others ... I therefore believe that the problem is the same also because coincidentally we have the same problems at the same time ... Maybe you have not read this link sent by the friend MPC, here we are talking about Vega, so another model but problem itself. Here they say they have solved but I did not understand anything about how they did unfortunately. If you can better understand what they did then it's great ... I'll post the link ...

https://gitlab.freedesktop.org/drm/amd/-/issues/1723

It looks like it's the same solution, though I still think the underlying issue is different. You could change this in Petitboot or Grub, but either way you want to add add `amdgpu.aspm=0` to your kernel options.
Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 09, 2021, 12:43:31 am
I mainly use Ubuntu 20.04.3 with Xorg and the problem is the same, so I don't think it affects the graphics server.  They changed some damn parameters on the new Kernels and now we have problems and what's worse they still haven't solved it ...!  Classic sorry for Kernel options you mean when configuring the parameters?  Is there this option that you say must be set to 0?  Let me get it right ...
Title: Re: Blackbird Radeon RX580
Post by: MPC7500 on November 09, 2021, 05:31:22 am
@mcarden: I would try to blacklist the AST GPU.
Title: Re: Blackbird Radeon RX580
Post by: sharkcz on November 09, 2021, 08:17:52 am
@ClassicHasClass, no output during petitboot, but the aspeed's HDMI does. The card's backlight LEDs for its logo *do* light during petitboot but then go out on boot. I seem to recall they used to stay on when the card was working.F35 is Wayland without the xorg option that earlier Fedoras had.

@sharkcz, I'm really, really hoping to not have to go down any sort of a kernel compiling rabbit Hole.

--
MC
It's not that bad, I was worried myself :-) I would suggest to start with the drm-next-5.14 branch from the AMD tree and it should be ~10 iterations.
Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 09, 2021, 12:24:40 pm
Hi Sharkcz, so you too have noticed that changes have been made starting from version 5.14 onwards, which have affected the functioning of AMD GPUs. Is it possible to hope for a correction in good times of this problem in the Linux Kernel by the Kernel team? I had written about the problem, already at the beginning of version 5.14 and therefore I tried to make everyone aware of this problem but at the time, unfortunately few followed me ...
Title: Re: Blackbird Radeon RX580
Post by: ClassicHasClass on November 09, 2021, 01:13:01 pm
I mainly use Ubuntu 20.04.3 with Xorg and the problem is the same, so I don't think it affects the graphics server.  They changed some damn parameters on the new Kernels and now we have problems and what's worse they still haven't solved it ...!  Classic sorry for Kernel options you mean when configuring the parameters?  Is there this option that you say must be set to 0?  Let me get it right ...

Yes, you want to change the kernel command line that Petitboot uses (which, unless you're doing something really strange, should be the same as Grub's). You don't need to rebuild the kernel for that option.
Title: Re: Blackbird Radeon RX580
Post by: mcarden on November 09, 2021, 07:32:47 pm
@MPC7500, we have a winner!

Code: [Select]
sudo grubby --update-kernel=ALL --args="modprobe.blacklist=ast"
...achieved the desired result.

Thanks!
Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 10, 2021, 01:14:25 am
Ok Classic, I just have to understand the exact procedure to do it too.  I remember well that when we first made these Navi work, for example, we changed a line in the petit Boot.  After loading, go to the command line and type this command: # nvram -p ibm, skiboot --update-config fast-reset = 0 # nvram -p ibm, skiboot --print-config "ibm, skiboot" Partition     
Now this has always been set on my motherboard, what else needs to be done, how exactly do you proceed from Petit Boot and which command should you enter? Thanks
Title: Re: Blackbird Radeon RX580
Post by: sharkcz on November 10, 2021, 03:09:46 am
grub is using the GRUB_CMDLINE_LINUX variable defined in /etc/grub/default to set the default value of the kernel parameters for the host Linux
Title: Re: Blackbird Radeon RX580
Post by: sharkcz on November 10, 2021, 03:15:35 am
Hi Sharkcz, so you too have noticed that changes have been made starting from version 5.14 onwards, which have affected the functioning of AMD GPUs. Is it possible to hope for a correction in good times of this problem in the Linux Kernel by the Kernel team? I had written about the problem, already at the beginning of version 5.14 and therefore I tried to make everyone aware of this problem but at the time, unfortunately few followed me ...
5.14 was OK for my WX4100, the breakage came with 5.15 rc1

The highest chance to get amdgpu driver issues fixed is to report them in the AMD bug tracker (https://gitlab.freedesktop.org/drm/amd/-/issues) with an information what particular commit caused the issue (identified using the bisection process).
Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 10, 2021, 03:43:21 am
Shark sorry, it means that I have to go to that path file you wrote me to eventually change the value and bring it to 0 as Classic says, I understand correctly?
Title: Re: Blackbird Radeon RX580
Post by: ClassicHasClass on November 10, 2021, 11:13:58 am
@MPC7500, we have a winner!

Code: [Select]
sudo grubby --update-kernel=ALL --args="modprobe.blacklist=ast"
...achieved the desired result.

Thanks!

This could still make things tricky in the unusual but by no means remote possibility something still goes wrong with the graphics card. It still sounds like a bug that this should be necessary (it wasn't before). I'll investigate this when I do my upgrade.
Title: Re: Blackbird Radeon RX580
Post by: MPC7500 on November 10, 2021, 12:58:54 pm
@MauryG5: Follow this guide and replace the line with your desired line or change the GRUB file manually:
https://docs.fedoraproject.org/en-US/fedora/rawhide/system-administrators-guide/kernel-module-driver-configuration/Working_with_the_GRUB_2_Boot_Loader/#sec-Making_Persistent_Changes_to_a_GRUB_2_Menu_Using_the_grubby_Tool

@mcarden: Great. I have to blacklist AST on Void also, otherwise even with a xorg.conf file, it doesn't work. I hardly guess on Fedora it only works accidentally.
Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 10, 2021, 04:13:45 pm
Sorry, you mean I have to make this line become: ~] # grubby --set-default /boot/vmlinuz-5.15.fc35.ppc64le? Would this be the change to be made to get it started with the AMD GPU?
Title: Re: Blackbird Radeon RX580
Post by: MPC7500 on November 12, 2021, 06:54:47 pm
You posted the link already.

... I'll post the link ...

https://gitlab.freedesktop.org/drm/amd/-/issues/1723

You have to add amdgpu.aspm=0 on the kernel command line.
Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 13, 2021, 07:59:53 am
Ok but what I didn't understand is what is this kernel command line and where is it sorry ...
Title: Re: Blackbird Radeon RX580
Post by: ClassicHasClass on November 13, 2021, 10:14:38 am
https://askubuntu.com/questions/19486/how-do-i-add-a-kernel-boot-parameter
Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 14, 2021, 12:43:43 pm
Guys good evening, unfortunately the first attempt didn't succeed. I have had the procedure that posted the friend classic and I inserted that row that my friend mpc told me but unfortunately things worsen as well because the gpu goes out at a certain point ... in the picture you see what I did , I set the kernel 5.15.2 and I inserted that row. Maybe I have to change something else I don't know is that it is not at time unfortunately ... all this I am trying on Debian 11 where anyway there is the same software to change the grub or grub customizer ...

Title: Re: Blackbird Radeon RX580
Post by: MPC7500 on November 14, 2021, 03:48:45 pm
I can't imagine that ClassicHasClass told you what you see in the screenshot. And I wrote that you should ADD the line, not replace it.
And after adding the line, you have to update GRUB.
Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 14, 2021, 04:47:14 pm
Ok but sorry how do I tell that application to add and not replace? I've never used it so I don't know how it works, I just did a test and I don't know how to actually add this line ... If you see the photo there isn't a part that makes you understand if that line is addition or if it is doing a replacement ... I don't quite understand how to use this application honestly ...Obviously I'm not saying that Classic told me to replace it, I'm just saying that I didn't understand how to add it and how you understand that instead I'm actually replacing it ...
Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 15, 2021, 03:16:25 pm
In any case, looking closer at the application I used, I saw that the line where you can put lines was empty and I inserted that line you told me MPC, so I think I have not replaced anything at the end. Then I repeat I don't know how that application works, I should better understand how to use it and how to add that line correctly ...
Title: Re: Blackbird Radeon RX580
Post by: MPC7500 on November 15, 2021, 03:53:55 pm
If that is the case, then everything is okay. Then you only have to save/ confirm your settings. Dunno how the application works. Google will have some results for sure. And as I have written previously, don't forget to update GRUB.
Code: [Select]
sudo update-grub
Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 17, 2021, 04:45:42 pm
Yes MPC, I saved the settings and the only thing I didn't do was to update the grub but I didn't do it just because I thought that this application was also in charge of updating the grub, that's why not the 'I did. So I have to try to update the grub and restart, hoping it works ... As soon as I try I update you ...
Title: Re: Blackbird Radeon RX580
Post by: MauryG5 on November 19, 2021, 02:59:21 pm
Succeeded, thanks again to friend MPC for his great advice, basically my mistake was simply the missing grub update command. I didn't do it simply because I believed that the Grub-Customizer application itself did both save and update the grub but I was wrong. Now I will compile 5.15 for Ubuntu as well. Thanks again MPC and also friend Classic Has for their valuable advice.