My Blackbird does not boot into the petitboot menu anymore (black screen) even though I can see the execution log on screen and via the ssh console with:
ssh -p 2200 root@my.ip.address
The last available console output is a failure saying:
[ 90.334100762,3] PHB#0005[0:5]: eeh_freeze_clear on fenced PHB
XE autoconfiguration failed
I have only installed a SATA-HDD drive with a bootable Ubuntu Server 19.10 (no GPU).
I have not changed/updated any firmware.
Last thing I have done was testing a PCIe GPU (NVIDIA).
After removing the GPU card from the PCIe slot I could no longer boot into petitboot...
What is going wrong?
The relevant log is (full console output is attached as file):
--== Welcome to Hostboot hostboot-3beba24/hbicore.bin ==--
3.06480|secure|SecureROM valid - enabling functionality
8.29259|Booting from SBE side 0 on master proc=00050000
8.46568|ISTEP 6. 5 - host_init_fsi
8.93808|ISTEP 6. 6 - host_set_ipl_parms
9.49036|ISTEP 6. 7 - host_discover_targets
10.09037|HWAS|PRESENT> DIMM[03]=8080000000000000
10.09038|HWAS|PRESENT> Proc[05]=8000000000000000
10.09040|HWAS|PRESENT> Core[07]=5565000000000000
10.49937|ISTEP 6. 8 - host_update_master_tpm
10.52369|SECURE|Security Access Bit> 0x0000000000000000
10.52370|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
...
57.11902|ISTEP 21. 2 - host_verify_hdat
57.20205|ISTEP 21. 3 - host_start_payload
[ 58.179010391,5] OPAL skiboot-c81f9d6 starting...
[ 58.179013552,7] initial console log level: memory 7, driver 5
[ 58.179015737,6] CPU: P9 generation processor (max 4 threads/core)
...
[ 65.266762790,5] PHB#0000:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=CPU1 Slot2 (16x)
Petitboot (v1.10.3-pdd2d545)
──────────────────────────────────────────────────────────────────────────────
System information
System configuration
System status log
Language
Rescan devices
Retrieve config from URL
Plugins (0)
*Exit to shell
──────────────────────────────────────────────────────────────────────────────
Enter=accept, e=edit, n=new, x=exit, l=language, g=log, h=help
Info: Waiting for device discovery[ 85.086133287,3] PHB#0005[0:5]: PHB Freeze/Fence detected !
[ 85.086197573,3] PHB#0005[0:5]: PCI FIR=2000000000000000
[ 85.086249297,3] PHB#0005[0:5]: PCI FIR WOF=2000000000000000
[ 85.086289203,3] PHB#0005[0:5]: NEST FIR=0000800000000000
[ 85.086354836,3] PHB#0005[0:5]: NEST FIR WOF=0000800000000000
[ 85.086394899,3] PHB#0005[0:5]: ERR RPT0=0000000000000001
[ 85.086489826,3] PHB#0005[0:5]: ERR RPT1=0000000000000000
[ 85.086534460,3] PHB#0005[0:5]: AIB ERR=0000200000000000
[ 85.086941635,3] PHB#0005[0:5]: brdgCtl = 00000002
[ 85.087002150,3] PHB#0005[0:5]: deviceStatus = 00200020
[ 85.087036852,3] PHB#0005[0:5]: slotStatus = 00402000
[ 85.087081358,3] PHB#0005[0:5]: linkStatus = a8120008
[ 85.087137004,3] PHB#0005[0:5]: devCmdStatus = 00100107
[ 85.087181127,3] PHB#0005[0:5]: devSecStatus = 00000000
[ 85.087239088,3] PHB#0005[0:5]: rootErrorStatus = 00000000
[ 85.087285589,3] PHB#0005[0:5]: corrErrorStatus = 00000000
[ 85.087325009,3] PHB#0005[0:5]: uncorrErrorStatus = 00000000
[ 85.087370016,3] PHB#0005[0:5]: devctl = 00000020
[ 85.087419580,3] PHB#0005[0:5]: devStat = 00000020
[ 85.087466277,3] PHB#0005[0:5]: tlpHdr1 = 00000000
...
[ 85.088610802,3] PHB#0005[0:5]: phbRxeArbErrorLog1 = 0000000000000000
[Disk: sda2 / ef49aa17-bb70-4fea-a8fc-29e235f7ab9f]
Ubuntu, with Linux 5.3.0-26-generic (recovery mode)
Ubuntu, with Linux 5.3.0-26-generic
Ubuntu, with Linux 5.3.0-29-generic (recovery mode)
Ubuntu, with Linux 5.3.0-29-generic
Ubuntu
[ 85.088655011,3] PHB#0005[0:5]: phbRxeMrgErrorStatus = 0000000000000001
...
[ 85.089573601,3] PHB#0005[0:5]: PEST[0ff] = 3740002a01000000 0000000000000000
[enP4p1s0f2] Probing from base tftp://192.168.178.1/pxelinux.cfg/[ 90.311273403,3] PHB#0005[0:5]: PHB Freeze/Fence detected !
[ 90.311357669,3] PHB#0005[0:5]: PCI FIR=2000000000000000
...
[ 90.315282185,3] PHB#0005[0:5]: phbRegbErrorLog1 = 0001020000000000
[ 90.315338900,3] PHB#0005[0:5]: PEST[000] = 8000000000000000 8000000000000000
[ 90.315413179,3] PHB#0005[0:5]: PEST[001] = 8000000000000000 8000000000000000
[ 90.315491213,3] PHB#0005[0:5]: PEST[002] = 8000000000000000 8000000000000000
...
[ 90.333937493,3] PHB#0005[0:5]: PEST[0fe] = 8000000000000000 8000000000000000
[ 90.334011680,3] PHB#0005[0:5]: PEST[0ff] = b740002a01000000 8000000000000000
[ 90.334100762,3] PHB#0005[0:5]: eeh_freeze_clear on fenced PHB
XE autoconfiguration failed
PS: I have logged in into OpenBMC via SSH and see these strange error messages that may be related:
root@blackbird:~# journalctl | grep fail
May 10 19:37:25 blackbird kernel: g_mass_storage 1e6a0000.usb-vhub:p1: failed to start g_mass_storage: -22
May 10 19:37:27 blackbird systemd-udevd[789]: Process 'mtd_probe /dev/mtd2ro' failed with exit code 1.
May 10 19:37:27 blackbird systemd-udevd[790]: Process 'mtd_probe /dev/mtd3ro' failed with exit code 1.
May 10 19:37:27 blackbird systemd-udevd[837]: Process 'mtd_probe /dev/mtd4ro' failed with exit code 1.
May 10 19:37:27 blackbird systemd-udevd[792]: Process 'mtd_probe /dev/mtd5ro' failed with exit code 1.
May 10 19:37:28 blackbird systemd-udevd[788]: Process 'mtd_probe /dev/mtd0ro' failed with exit code 1.
May 10 19:37:28 blackbird systemd-udevd[791]: Process 'mtd_probe /dev/mtd1ro' failed with exit code 1.
May 10 19:37:28 blackbird systemd-udevd[836]: Process 'mtd_probe /dev/mtd6ro' failed with exit code 1.
May 10 19:37:29 blackbird kernel: A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check.
May 10 19:37:31 blackbird kernel: A link change request failed with some changes committed already. Interface sit0 may have been left with an inconsistent configuration, please check.
Feb 03 22:05:40 blackbird kernel[1052]: [ 3.810720] g_mass_storage 1e6a0000.usb-vhub:p1: failed to start g_mass_storage: -22
Feb 03 22:05:43 blackbird kernel[1052]: [ 22.397690] A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check.
Feb 03 22:05:43 blackbird kernel[1052]: [ 24.051627] A link change request failed with some changes committed already. Interface sit0 may have been left with an inconsistent configuration, please check.
Feb 07 21:45:05 blackbird systemd[1]: Starting Stop the ethernet link failover...
Feb 07 21:45:07 blackbird systemd[1]: Started Stop the ethernet link failover.
PS2: This question is more precise follow-up to that question:
https://forums.raptorcs.com/index.php?action=post;topic=49.0;last_msg=473