Author Topic: Issue with blackbird booting with GPU  (Read 5312 times)

r34per

  • Newbie
  • *
  • Posts: 27
  • Karma: +2/-0
    • View Profile
Issue with blackbird booting with GPU
« on: September 07, 2022, 09:28:10 pm »
Ive been trying to figure out an issue that has popped up on my blackbird I just got. I have an AMD radeon rx460 installed, and I copied the firmware to petitboot per the wiki instructions. I have not had issue except for one in particular.

I dual boot void linux and fedora 36 on 2 separate partitions on the same ssd. If I am booted into fedora and I try rebooting, once I get back to the boot menu and I select fedora(void is my default), after a little bit the system will reset on its own(I have no image but I can hear the system make the usual beep it does once petitboot is booted), but when it comes back up I have no picture at all. I plugged a serial cable in and here is the output that I get right before the blackbird reboots on its own-

Code: [Select]
SIGTERM received, booting...
[   30.567440] kexec_core: Starting new kernel
[  732.338553086,3] PHB#0000[0:0]: PHB Freeze/Fence detected !
[  732.338627435,3] PHB#0000[0:0]:             PCI FIR=2000000000000000
[  732.338668357,3] PHB#0000[0:0]:         PCI FIR WOF=2000000000000000
[  732.338708204,3] PHB#0000[0:0]:            NEST FIR=0000800000000000
[  732.338744857,3] PHB#0000[0:0]:        NEST FIR WOF=0000800000000000
[  732.338803303,3] PHB#0000[0:0]:            ERR RPT0=0000000000000001
[  732.338843416,3] PHB#0000[0:0]:            ERR RPT1=0000000000000000
[  732.338886921,3] PHB#0000[0:0]:             AIB ERR=0000200000000000
[  732.339495958,3] PHB#0000[0:0]:                  brdgCtl = 00000002
[  732.339564766,3] PHB#0000[0:0]:             deviceStatus = 00200020
[  732.339597369,3] PHB#0000[0:0]:               slotStatus = 00402000
[  732.339634254,3] PHB#0000[0:0]:               linkStatus = e8810008
[  732.339671612,3] PHB#0000[0:0]:             devCmdStatus = 00100107
[  732.339713240,3] PHB#0000[0:0]:             devSecStatus = 00000000
[  732.339746966,3] PHB#0000[0:0]:          rootErrorStatus = 00000000
[  732.339791957,3] PHB#0000[0:0]:          corrErrorStatus = 00000000
[  732.339824504,3] PHB#0000[0:0]:        uncorrErrorStatus = 00000000
[  732.339876364,3] PHB#0000[0:0]:                   devctl = 00000020
[  732.339922642,3] PHB#0000[0:0]:                  devStat = 00000020
[  732.339965476,3] PHB#0000[0:0]:                  tlpHdr1 = 00000000
[  732.340001543,3] PHB#0000[0:0]:                  tlpHdr2 = 00000000
[  732.340043233,3] PHB#0000[0:0]:                  tlpHdr3 = 00000000         
[  732.340080492,3] PHB#0000[0:0]:                  tlpHdr4 = 00000000         
[  732.340114836,3] PHB#0000[0:0]:                 sourceId = 00000000         
[  732.340146126,3] PHB#0000[0:0]:                     nFir = 0000800000000000 
[  732.340184116,3] PHB#0000[0:0]:                 nFirMask = 0030001c00000000 
[  732.340224442,3] PHB#0000[0:0]:                  nFirWOF = 0000800000000000 
[  732.340265905,3] PHB#0000[0:0]:                 phbPlssr = 0000001800000000 
[  732.340308697,3] PHB#0000[0:0]:                   phbCsr = 0000001800000000 
[  732.340349077,3] PHB#0000[0:0]:                   lemFir = 0000000100100100 
[  732.340384576,3] PHB#0000[0:0]:             lemErrorMask = 0000000000000000 
[  732.340421385,3] PHB#0000[0:0]:                   lemWOF = 0000000100000000 
[  732.340460422,3] PHB#0000[0:0]:           phbErrorStatus = 00000c8000000000 
[  732.340513791,3] PHB#0000[0:0]:      phbFirstErrorStatus = 0000008000000000 
[  732.340555510,3] PHB#0000[0:0]:             phbErrorLog0 = 2148000098000240 
[  732.340600566,3] PHB#0000[0:0]:             phbErrorLog1 = a008400000000000 
[  732.340640851,3] PHB#0000[0:0]:        phbTxeErrorStatus = 0000000000000000 
[  732.340677533,3] PHB#0000[0:0]:   phbTxeFirstErrorStatus = 0000000000000000 
[  732.340717880,3] PHB#0000[0:0]:          phbTxeErrorLog0 = 0000000000000000 
[  732.340759339,3] PHB#0000[0:0]:          phbTxeErrorLog1 = 0000000000000000 
[  732.340800798,3] PHB#0000[0:0]:     phbRxeArbErrorStatus = 0000000020000000 
[  732.340842211,3] PHB#0000[0:0]: phbRxeArbFrstErrorStatus = 0000000020000000 
[  732.340878983,3] PHB#0000[0:0]:       phbRxeArbErrorLog0 = 8080000000000000 
[  732.340917996,3] PHB#0000[0:0]:       phbRxeArbErrorLog1 = 0000000000000000 
[  732.340955935,3] PHB#0000[0:0]:     phbRxeMrgErrorStatus = 0000000000000001 
[  732.340996027,3] PHB#0000[0:0]: phbRxeMrgFrstErrorStatus = 0000000000000001 
[  732.341039826,3] PHB#0000[0:0]:       phbRxeMrgErrorLog0 = 0000000000000000 
[  732.341082502,3] PHB#0000[0:0]:       phbRxeMrgErrorLog1 = 0000000000000000 
[  732.341124221,3] PHB#0000[0:0]:     phbRxeTceErrorStatus = 0000000000000000 
[  732.341163276,3] PHB#0000[0:0]: phbRxeTceFrstErrorStatus = 0000000000000000 
[  732.341197728,3] PHB#0000[0:0]:       phbRxeTceErrorLog0 = 0000000000000000 
[  732.341234469,3] PHB#0000[0:0]:       phbRxeTceErrorLog1 = 0000000000000000 
[  732.341278243,3] PHB#0000[0:0]:        phbPblErrorStatus = 0000000000000000 
[  732.341319709,3] PHB#0000[0:0]:   phbPblFirstErrorStatus = 0000000000000000 
[  732.341361147,3] PHB#0000[0:0]:          phbPblErrorLog0 = 0000000000000000 
[  732.341400227,3] PHB#0000[0:0]:          phbPblErrorLog1 = 0000000000000000 
[  732.341439224,3] PHB#0000[0:0]:      phbPcieDlpErrorLog1 = 0000000000000000 
[  732.341476032,3] PHB#0000[0:0]:      phbPcieDlpErrorLog2 = 0000000000000000 
[  732.341518708,3] PHB#0000[0:0]:    phbPcieDlpErrorStatus = 0000000000000000 
[  732.341575426,3] PHB#0000[0:0]:       phbRegbErrorStatus = 0010000000000000 
[  732.341616940,3] PHB#0000[0:0]:  phbRegbFirstErrorStatus = 0010000000000000 
[  732.341658504,3] PHB#0000[0:0]:         phbRegbErrorLog0 = 8800005800000000 
[  732.341693140,3] PHB#0000[0:0]:         phbRegbErrorLog1 = 0000000007011000 
[  732.341741902,3] PHB#0000[0:0]:                PEST[1ff] = 3740002a01000000 0

After the system reboots on its own, it will boot into the os, I was able to SSH into my fedora os, and I coudl watch the boot process through a serial cable. If I shut the system down either through the os via serial console or just holding the power button, and then powering it back on I can get an image off my video card and boot into the OS just fine. But rebooting is where the issues start. Strangely if I have a serial cable plugged in to the system prior to powering it on the issue doesn't happen at all. I can reboot all day long.

This is not an issue with my void linux partition. I can reboot void linux and select void linux from the boot menu and it boots normally.


If I remove the jumper to enable the onboard hdmi the same thing happens. I also notice the image on the onboard hdmi was really colorful, but it went back the correct colors right as I went to grab my phone to take a picture. The weird colors haven't returned since, so maybe it was just my cable.

With the my rx460 removed and just running off the onboard hdmi I don't get the problem. I then tried another video card I have, an amd r5 240, and it also does not produce the issue. Erasing BOOTKERNFW seems to resolve the problem, so could it just be the gpu firmware? I used the firmware from void linux which has a 5.18 kernel version



« Last Edit: September 07, 2022, 10:11:57 pm by r34per »

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 467
  • Karma: +35/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: Issue with blackbird booting with GPU
« Reply #1 on: September 07, 2022, 10:45:12 pm »
Do you have fast reboot still on?

Something like nvram -p ibm,skiboot --update-config fast-reset=0 as root should disable it.

r34per

  • Newbie
  • *
  • Posts: 27
  • Karma: +2/-0
    • View Profile
Re: Issue with blackbird booting with GPU
« Reply #2 on: September 08, 2022, 08:59:20 pm »
ahh that seems to have fixed it, thanks!

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 467
  • Karma: +35/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: Issue with blackbird booting with GPU
« Reply #3 on: September 08, 2022, 10:18:39 pm »
They really should just ship that way. Fast reboot isn't much faster than regular rebooting, even when it works (which is about half the time).

MPC7500

  • Hero Member
  • *****
  • Posts: 587
  • Karma: +41/-1
    • View Profile
    • Twitter
Re: Issue with blackbird booting with GPU
« Reply #4 on: September 09, 2022, 06:54:45 am »
Fast reboot isn't much faster? It saves me 90 seconds if active.

r34per

  • Newbie
  • *
  • Posts: 27
  • Karma: +2/-0
    • View Profile
Re: Issue with blackbird booting with GPU
« Reply #5 on: September 11, 2022, 01:24:49 pm »
Oh yes, when it didn't error out I could reboot and be back to the desktop in under a minute

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 467
  • Karma: +35/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: Issue with blackbird booting with GPU
« Reply #6 on: September 11, 2022, 03:30:51 pm »
Fast reboot isn't much faster? It saves me 90 seconds if active.

Maybe it's the way my T2 is configured, but while I agree it's quicker, I don't find it *much* quicker. Plus, when it muffs it, you end up doing a full reboot anyway.

MPC7500

  • Hero Member
  • *****
  • Posts: 587
  • Karma: +41/-1
    • View Profile
    • Twitter
Re: Issue with blackbird booting with GPU
« Reply #7 on: September 11, 2022, 03:49:24 pm »
How is your TalosII configured to boot faster? Or do you use !BMC?

Corvidae

  • Newbie
  • *
  • Posts: 25
  • Karma: +11/-0
    • View Profile
Re: Issue with blackbird booting with GPU
« Reply #8 on: September 11, 2022, 04:10:18 pm »
FWIW, I have a WX 7100 (similar to RX 580) with no firmware in BOOTKERNFW (I just use the HDMI output for booting) and I can fast reboot multiple times with no GPU issues. I also thought I read things in the past about some AMD GPUs not liking being hot reset - this was mostly in the context of VFIO GPU passthrough, but I would imagine it would be similar here.

power9mm

  • Full Member
  • ***
  • Posts: 103
  • Karma: +8/-3
    • View Profile
Re: Issue with blackbird booting with GPU
« Reply #9 on: September 13, 2022, 05:38:34 am »
i think the wx7100 might have better support over the gamer cards. I don't know for sure though.

r34per

  • Newbie
  • *
  • Posts: 27
  • Karma: +2/-0
    • View Profile
Re: Issue with blackbird booting with GPU
« Reply #10 on: September 13, 2022, 10:44:16 am »
Is there pretty good support for the wx series card in fedora? I've thought about getting a wx2100 or wx3100 for my blackbird to have a little lower power card.

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 467
  • Karma: +35/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: Issue with blackbird booting with GPU
« Reply #11 on: September 13, 2022, 11:06:06 am »
How is your TalosII configured to boot faster? Or do you use !BMC?

Sorry if that was unclear: I meant with fast reset on. It's just that it fails so often that it doesn't seem worth the modest improvement, at least on my machine.

Is there pretty good support for the wx series card in fedora? I've thought about getting a wx2100 or wx3100 for my blackbird to have a little lower power card.

I use a Raptor BTO WX7100 in my Fedora 36 Talos II, and it's fine. The fast reboot problems I mention are with other devices. I do have the firmware loaded in BOOTKERNFW so I can access Petitboot.