Raptor Computing Systems Hardware > Blackbird

Issue with blackbird booting with GPU

(1/3) > >>

r34per:
Ive been trying to figure out an issue that has popped up on my blackbird I just got. I have an AMD radeon rx460 installed, and I copied the firmware to petitboot per the wiki instructions. I have not had issue except for one in particular.

I dual boot void linux and fedora 36 on 2 separate partitions on the same ssd. If I am booted into fedora and I try rebooting, once I get back to the boot menu and I select fedora(void is my default), after a little bit the system will reset on its own(I have no image but I can hear the system make the usual beep it does once petitboot is booted), but when it comes back up I have no picture at all. I plugged a serial cable in and here is the output that I get right before the blackbird reboots on its own-


--- Code: ---SIGTERM received, booting...
[   30.567440] kexec_core: Starting new kernel
[  732.338553086,3] PHB#0000[0:0]: PHB Freeze/Fence detected !
[  732.338627435,3] PHB#0000[0:0]:             PCI FIR=2000000000000000
[  732.338668357,3] PHB#0000[0:0]:         PCI FIR WOF=2000000000000000
[  732.338708204,3] PHB#0000[0:0]:            NEST FIR=0000800000000000
[  732.338744857,3] PHB#0000[0:0]:        NEST FIR WOF=0000800000000000
[  732.338803303,3] PHB#0000[0:0]:            ERR RPT0=0000000000000001
[  732.338843416,3] PHB#0000[0:0]:            ERR RPT1=0000000000000000
[  732.338886921,3] PHB#0000[0:0]:             AIB ERR=0000200000000000
[  732.339495958,3] PHB#0000[0:0]:                  brdgCtl = 00000002
[  732.339564766,3] PHB#0000[0:0]:             deviceStatus = 00200020
[  732.339597369,3] PHB#0000[0:0]:               slotStatus = 00402000
[  732.339634254,3] PHB#0000[0:0]:               linkStatus = e8810008
[  732.339671612,3] PHB#0000[0:0]:             devCmdStatus = 00100107
[  732.339713240,3] PHB#0000[0:0]:             devSecStatus = 00000000
[  732.339746966,3] PHB#0000[0:0]:          rootErrorStatus = 00000000
[  732.339791957,3] PHB#0000[0:0]:          corrErrorStatus = 00000000
[  732.339824504,3] PHB#0000[0:0]:        uncorrErrorStatus = 00000000
[  732.339876364,3] PHB#0000[0:0]:                   devctl = 00000020
[  732.339922642,3] PHB#0000[0:0]:                  devStat = 00000020
[  732.339965476,3] PHB#0000[0:0]:                  tlpHdr1 = 00000000
[  732.340001543,3] PHB#0000[0:0]:                  tlpHdr2 = 00000000
[  732.340043233,3] PHB#0000[0:0]:                  tlpHdr3 = 00000000         
[  732.340080492,3] PHB#0000[0:0]:                  tlpHdr4 = 00000000         
[  732.340114836,3] PHB#0000[0:0]:                 sourceId = 00000000         
[  732.340146126,3] PHB#0000[0:0]:                     nFir = 0000800000000000 
[  732.340184116,3] PHB#0000[0:0]:                 nFirMask = 0030001c00000000 
[  732.340224442,3] PHB#0000[0:0]:                  nFirWOF = 0000800000000000 
[  732.340265905,3] PHB#0000[0:0]:                 phbPlssr = 0000001800000000 
[  732.340308697,3] PHB#0000[0:0]:                   phbCsr = 0000001800000000 
[  732.340349077,3] PHB#0000[0:0]:                   lemFir = 0000000100100100 
[  732.340384576,3] PHB#0000[0:0]:             lemErrorMask = 0000000000000000 
[  732.340421385,3] PHB#0000[0:0]:                   lemWOF = 0000000100000000 
[  732.340460422,3] PHB#0000[0:0]:           phbErrorStatus = 00000c8000000000 
[  732.340513791,3] PHB#0000[0:0]:      phbFirstErrorStatus = 0000008000000000 
[  732.340555510,3] PHB#0000[0:0]:             phbErrorLog0 = 2148000098000240 
[  732.340600566,3] PHB#0000[0:0]:             phbErrorLog1 = a008400000000000 
[  732.340640851,3] PHB#0000[0:0]:        phbTxeErrorStatus = 0000000000000000 
[  732.340677533,3] PHB#0000[0:0]:   phbTxeFirstErrorStatus = 0000000000000000 
[  732.340717880,3] PHB#0000[0:0]:          phbTxeErrorLog0 = 0000000000000000 
[  732.340759339,3] PHB#0000[0:0]:          phbTxeErrorLog1 = 0000000000000000 
[  732.340800798,3] PHB#0000[0:0]:     phbRxeArbErrorStatus = 0000000020000000 
[  732.340842211,3] PHB#0000[0:0]: phbRxeArbFrstErrorStatus = 0000000020000000 
[  732.340878983,3] PHB#0000[0:0]:       phbRxeArbErrorLog0 = 8080000000000000 
[  732.340917996,3] PHB#0000[0:0]:       phbRxeArbErrorLog1 = 0000000000000000 
[  732.340955935,3] PHB#0000[0:0]:     phbRxeMrgErrorStatus = 0000000000000001 
[  732.340996027,3] PHB#0000[0:0]: phbRxeMrgFrstErrorStatus = 0000000000000001 
[  732.341039826,3] PHB#0000[0:0]:       phbRxeMrgErrorLog0 = 0000000000000000 
[  732.341082502,3] PHB#0000[0:0]:       phbRxeMrgErrorLog1 = 0000000000000000 
[  732.341124221,3] PHB#0000[0:0]:     phbRxeTceErrorStatus = 0000000000000000 
[  732.341163276,3] PHB#0000[0:0]: phbRxeTceFrstErrorStatus = 0000000000000000 
[  732.341197728,3] PHB#0000[0:0]:       phbRxeTceErrorLog0 = 0000000000000000 
[  732.341234469,3] PHB#0000[0:0]:       phbRxeTceErrorLog1 = 0000000000000000 
[  732.341278243,3] PHB#0000[0:0]:        phbPblErrorStatus = 0000000000000000 
[  732.341319709,3] PHB#0000[0:0]:   phbPblFirstErrorStatus = 0000000000000000 
[  732.341361147,3] PHB#0000[0:0]:          phbPblErrorLog0 = 0000000000000000 
[  732.341400227,3] PHB#0000[0:0]:          phbPblErrorLog1 = 0000000000000000 
[  732.341439224,3] PHB#0000[0:0]:      phbPcieDlpErrorLog1 = 0000000000000000 
[  732.341476032,3] PHB#0000[0:0]:      phbPcieDlpErrorLog2 = 0000000000000000 
[  732.341518708,3] PHB#0000[0:0]:    phbPcieDlpErrorStatus = 0000000000000000 
[  732.341575426,3] PHB#0000[0:0]:       phbRegbErrorStatus = 0010000000000000 
[  732.341616940,3] PHB#0000[0:0]:  phbRegbFirstErrorStatus = 0010000000000000 
[  732.341658504,3] PHB#0000[0:0]:         phbRegbErrorLog0 = 8800005800000000 
[  732.341693140,3] PHB#0000[0:0]:         phbRegbErrorLog1 = 0000000007011000 
[  732.341741902,3] PHB#0000[0:0]:                PEST[1ff] = 3740002a01000000 0

--- End code ---

After the system reboots on its own, it will boot into the os, I was able to SSH into my fedora os, and I coudl watch the boot process through a serial cable. If I shut the system down either through the os via serial console or just holding the power button, and then powering it back on I can get an image off my video card and boot into the OS just fine. But rebooting is where the issues start. Strangely if I have a serial cable plugged in to the system prior to powering it on the issue doesn't happen at all. I can reboot all day long.

This is not an issue with my void linux partition. I can reboot void linux and select void linux from the boot menu and it boots normally.


If I remove the jumper to enable the onboard hdmi the same thing happens. I also notice the image on the onboard hdmi was really colorful, but it went back the correct colors right as I went to grab my phone to take a picture. The weird colors haven't returned since, so maybe it was just my cable.

With the my rx460 removed and just running off the onboard hdmi I don't get the problem. I then tried another video card I have, an amd r5 240, and it also does not produce the issue. Erasing BOOTKERNFW seems to resolve the problem, so could it just be the gpu firmware? I used the firmware from void linux which has a 5.18 kernel version



ClassicHasClass:
Do you have fast reboot still on?

Something like nvram -p ibm,skiboot --update-config fast-reset=0 as root should disable it.

r34per:
ahh that seems to have fixed it, thanks!

ClassicHasClass:
They really should just ship that way. Fast reboot isn't much faster than regular rebooting, even when it works (which is about half the time).

MPC7500:
Fast reboot isn't much faster? It saves me 90 seconds if active.

Navigation

[0] Message Index

[#] Next page

Go to full version