Author Topic: Petitboot lock after Fedora 36 update  (Read 5764 times)

DKnoto

  • Jr. Member
  • **
  • Posts: 82
  • Karma: +13/-0
    • View Profile
Petitboot lock after Fedora 36 update
« on: September 26, 2022, 12:07:45 pm »
Today, after the Fedora 36 update, I had an unpleasant surprise. The first phase of the system update went as usual without any problems.
Then, as recommended, I restarted the machine and the firmware update phase began. After this process was completed, it was rebooted
again and here are the problems:
  • Petitboot did not detect the disk, there is no list of possible kernels to load;
  • USB keyboard stopped working, locked on all USB sockets
  • the "Power" button stopped working, I couldn't turn off the machine
  • At the bottom of the screen, Petitboot wrote the message "Info: Waiting for device discovery"
Any ideas?

Minor fix, after the next restart the "Power" key works.

I made a video of how the boot process works: Petiboot after F36 firmware update

« Last Edit: September 26, 2022, 12:38:59 pm by DKnoto »
Desktop: Talos II T2P9S01 REV 1.01 | IBM Power 9/18c DD2.3, 02CY646 | AMD Radeon Pro WX7100 | 64GB RAM | SSD 1TB

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 467
  • Karma: +35/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: Petitboot lock after Fedora 36 update
« Reply #1 on: September 26, 2022, 08:15:54 pm »
Firmware update??

DKnoto

  • Jr. Member
  • **
  • Posts: 82
  • Karma: +13/-0
    • View Profile
Re: Petitboot lock after Fedora 36 update
« Reply #2 on: September 27, 2022, 12:32:36 am »
This is what this upgrade step is called, I don't know exactly what the Fedora installer is upgrading then.
Desktop: Talos II T2P9S01 REV 1.01 | IBM Power 9/18c DD2.3, 02CY646 | AMD Radeon Pro WX7100 | 64GB RAM | SSD 1TB

DKnoto

  • Jr. Member
  • **
  • Posts: 82
  • Karma: +13/-0
    • View Profile
Re: Petitboot lock after Fedora 36 update
« Reply #3 on: September 27, 2022, 07:06:11 am »
I am one step further in analyzing the problem. I looked at the /var/log/obmc-console.log log and found something like this:
Code: [Select]

Welcome to Petitboot
Info: Waiting for device discovery

[    6.722018] XFS: Assertion failed: !(fields & XFS_ILOG_DFORK) || (len == in_f->ilf_dsize), file: fs/xfs/xfs_log_recover.c, line: 3103
cpu 0x14: Vector: 700 (Program Check) at [c000000fda48b1c0]
    pc: c0080000011146bc: assfail+0x54/0x60 [xfs]
    lr: c008000001114694: assfail+0x2c/0x60 [xfs]
    sp: c000000fda48b450
   msr: 900000000282b033
  current = 0xc000000fda442100
  paca    = 0xc000000fff726e00   irqmask: 0x03   irq_happened: 0x01
    pid   = 655, comm = pb-discover
kernel BUG at fs/xfs/xfs_message.c:110!
Linux version 5.5.0-openpower1 (root@raptor-build-public-staging-01) (gcc version 6.5.0 (Buildroot 2019.05.3-06769-g7bdd570165)) #2 SMP Thu Feb 20 02:19:47 UTC 2020
enter ? for help
[c000000fda48b4b0] c00800000113bf74 xlog_recover_inode_pass2+0xaac/0xac0 [xfs]
[c000000fda48b590] c00800000113d980 xlog_recover_items_pass2+0x68/0xd0 [xfs]
[c000000fda48b5e0] c00800000113dc34 xlog_recover_commit_trans+0x24c/0x2f0 [xfs]
[c000000fda48b680] c00800000113deb0 xlog_recovery_process_trans+0x1d8/0x210 [xfs]
[c000000fda48b6f0] c00800000113e1b8 xlog_recover_process_data+0xc0/0x1a0 [xfs]
[c000000fda48b760] c00800000113e608 xlog_do_recovery_pass+0x1a0/0x740 [xfs]
[c000000fda48b8f0] c00800000113f3e0 xlog_do_log_recovery+0xb8/0x1c0 [xfs]
[c000000fda48b930] c00800000113f518 xlog_do_recover+0x30/0x210 [xfs]
[c000000fda48b9b0] c00800000113f7c0 xlog_recover+0xc8/0x1b0 [xfs]
[c000000fda48ba30] c008000001126754 xfs_log_mount+0x34c/0x3d8 [xfs]
[c000000fda48bac0] c0080000011160a8 xfs_mountfs+0x530/0xa10 [xfs]
[c000000fda48bb70] c00800000111ca30 xfs_fc_fill_super+0x3c8/0x5f0 [xfs]
[c000000fda48bc10] c0000000001f6758 get_tree_bdev+0x248/0x2e0
[c000000fda48bcb0] c00800000111b788 xfs_fc_get_tree+0x20/0x40 [xfs]
[c000000fda48bcd0] c0000000001f5ac8 vfs_get_tree+0x48/0x160
[c000000fda48bd50] c00000000022973c do_mount+0x7fc/0xba0
[c000000fda48bdd0] c00000000022a010 sys_mount+0xc0/0x180
[c000000fda48be20] c00000000000b50c system_call+0x5c/0x68
--- Exception: c01 (System Call) at 00007fff8dfdb6c4
SP (7ffff8318960) is in userspace
14:mon>

Then it's something with an SDD drive and an XFS file system on that drive. I took out the SSD disk and Petitboot came back to life :)

BTW. Does it make sense to replace Petitboot 1.12 with 1.13?
Desktop: Talos II T2P9S01 REV 1.01 | IBM Power 9/18c DD2.3, 02CY646 | AMD Radeon Pro WX7100 | 64GB RAM | SSD 1TB

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 467
  • Karma: +35/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: Petitboot lock after Fedora 36 update
« Reply #4 on: September 27, 2022, 10:54:33 am »
Probably couldn't hurt, but this looks like the filesystem is corrupt enough to freak out the Petitboot kernel, so I'm not sure that would fix it.

DKnoto

  • Jr. Member
  • **
  • Posts: 82
  • Karma: +13/-0
    • View Profile
Re: Petitboot lock after Fedora 36 update
« Reply #5 on: September 27, 2022, 11:19:47 am »
Problem solved. I took out the SSD drive, checked the XFS partitions on my laptop. Nothing wasn't happening, there weren't any mistakes. I put it in Talos and it works.
Magic ;)
Desktop: Talos II T2P9S01 REV 1.01 | IBM Power 9/18c DD2.3, 02CY646 | AMD Radeon Pro WX7100 | 64GB RAM | SSD 1TB

DKnoto

  • Jr. Member
  • **
  • Posts: 82
  • Karma: +13/-0
    • View Profile
Re: Petitboot lock after Fedora 36 update
« Reply #6 on: October 13, 2022, 10:00:15 am »
I've had two more XFS falls in the past two weeks. Once after another kernel update and the other after an unexpected power outage. This is definitely too much, fixing this situation is tedious and time-consuming, I have to remove the drive from Talos II and put it in another computer. Petitboot is unable to skip the stage of reading the state of the file systems on the SSD and inserting a rescue system on the USB does nothing.

With the current state of Petitboot software, it is impractical to install the main file system on XFS.
Desktop: Talos II T2P9S01 REV 1.01 | IBM Power 9/18c DD2.3, 02CY646 | AMD Radeon Pro WX7100 | 64GB RAM | SSD 1TB

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 467
  • Karma: +35/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: Petitboot lock after Fedora 36 update
« Reply #7 on: December 11, 2022, 10:04:58 pm »
Well, this just happened to me after the Fedora 37 upgrade and I'm pissed. I don't have another machine set up to fix the filesystem and Petitboot won't respond to any keys before trying to mount filesystems. I'm not sure what I can do with it yet.

@sharkcz, what can we do?

sharkcz

  • Newbie
  • *
  • Posts: 25
  • Karma: +3/-0
    • View Profile
Re: Petitboot lock after Fedora 36 update
« Reply #8 on: December 12, 2022, 03:24:28 am »
A tough question I think ...

It looks to me that the skiroot 5.5-based kernel isn't able to cope with some situations where an XFS filesystem written by much newer kernel can be left in.Thus the solution could be to switch to upstream PNOR firmware builds which is using 5.10 kernels (still quite old, no development for p9, a little bit for p10 ...). Should be doable for Blackbird, there is some old work in progress for Talos from me. I agree the safe way around is to avoid XFS (and/or btrfs) for the host OS rootfs or /boot and rely on ext4. I think the mis-behaviour won't be ppc64le specific, but it's uncommon to use 5.5 kernel to read a filesystem written by a much more kernel ...

And how to recover from the failure without a physical access. If there would be a way to set a host's nvram variable from the BMC, one could disable the xfs module for the skiroot kernel and boot from another media.

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 467
  • Karma: +35/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: Petitboot lock after Fedora 36 update
« Reply #9 on: December 15, 2022, 12:21:46 am »
All right, I'm back up. The Blackbird did the fixing of the XFS volume - there was a stuck log entry. Arguably Fedora may not have unmounted it cleanly, but Petitboot shouldn't just crash as a result. There really needs to be a way to bypass mounting completely (more details https://www.talospace.com/2022/12/when-petitboot-barfs-everythings-vomit.html ).

Thanks to @sharkcz for letting me bounce ideas off him in E-mail.