Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - bernie

Pages: [1]
1
User Zone / Help with diagnosing sudden computer shutdown
« on: January 30, 2025, 07:52:24 pm »
So my Blackbird system just suddenly shut down. I'd appreciate any help with diagnosing the issue. I found 3 high priority events logged by the BMC within 2 minutes of the sudden shut down. The messages are:
Code: [Select]
xyz.openbmc_project.Sensor.Device.Error.ReadFailure
CALLOUT_DEVICE_PATH=/sys/devices/platform/gpio-fsi/fsi0/slave@00:00/00:00:00:06/sbefifo1-dev0/occ-hwmon.1
CALLOUT_ERRNO=108
_PID=4791
Code: [Select]
org.open_power.Proc.FSI.Error.MasterDetectionFailure
CALLOUT_DEVICE_PATH=/sys/devices/platform/gpio-fsi/fsi0/slave@00:00/raw
CALLOUT_ERRNO=0
_PID=26801
Code: [Select]
org.open_power.Proc.FSI.Error.MasterDetectionFailure
CALLOUT_DEVICE_PATH=/sys/devices/platform/gpio-fsi/fsi0/slave@00:00/raw
CALLOUT_ERRNO=0
_PID=26873
I see many similar "MasterDetectionFailure" messages in the BMC event log that don't seem to have caused issues before. I don't see any other "ReadFailure" messages though.
The BMC seemed to think the system was still running in the "Server power operations" section. A Warm reboot failed, but a subsequent Power on fired up the system again.
There were no messages in the system journal for 5 minutes before the shut down, and those messages were not errors and appear to be unrelated to the issue.
The system is connected to a UPS, and other computers connected to the same had no problems.

2
User Zone / Gentoo installation issue
« on: September 08, 2024, 03:15:28 am »
I have an NVIDIA GT710 that I'd like to get working with my Blackbird. After getting past the boot issue by disabling the nouveau module in Petitboot, I still can't get the card to work in the OS. I tried Fedora, Debian and Trisquel. Then, realizing that maybe the 4KB page size issue was affecting me, I decided to try Gentoo. I tried the Distribution (pre-built) kernel, but this also has a 64KB page size, and failed with the same errors as the others. So I compiled a kernel with the options that I thought best, but it won't boot with that kernel. The console output shows:
Code: [Select]
Run /init as init process
init[1]: illegal instruction (4) at 3fff894c8fe0 nip 3fff894c8fe0 lr 3fff894bcbdc code 1 in ld64.so.2[3fff89488000+4e000]
init[1]: code: 7ca32a14 7ca92850 78bfd183 41820084 73ea0001 7c070166 7d2a4b78 38c00010
init[1]: code: 39600020 39800030 381fffff 7fe8fb78 <f0000050> 41820020 2c200000 7c004f98
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
CPU: 10 PID: 1 Comm: init Not tainted 6.6.47-gentoo #6
Hardware name: C1P9S01 REV 1.02 POWER9 (raw) 0x4e1203 opal:skiboot-ecb1dc7 PowerNV
Call Trace:
[c0000000021a7a50] [c000000000d6ee58] dump_stack_lvl+0x6c/0x9c (unreliable)
[c0000000021a7a80] [c0000000000da2e4] panic+0x170/0x3ec
[c0000000021a7b20] [c0000000000e3680] do_exit+0xa70/0xa80
[c0000000021a7bf0] [c0000000000e38e4] do_group_exit+0x44/0xc0
[c0000000021a7c30] [c0000000000f7d30] get_signal+0xc50/0xc80
[c0000000021a7d20] [c00000000001c7f0] do_notify_resume+0xf0/0x420
[c0000000021a7dd0] [c000000000028ad8] interrupt_exit_user_prepare_main+0x158/0x1f0
[c0000000021a7e20] [c000000000028d7c] interrupt_exit_user_prepare+0x4c/0x70
[c0000000021a7e50] [c00000000000d444] interrupt_return_srr_user+0x8/0x12c
--- interrupt: f40 at 0x3fff894c8fe0
NIP:  00003fff894c8fe0 LR: 00003fff894bcbdc CTR: 0000000000000000
REGS: c0000000021a7e80 TRAP: 0f40   Not tainted  (6.6.47-gentoo)
MSR:  900000000200f033 <SF,HV,VEC,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 24000420  XER: 00000000
CFAR: c00000000000d55c IRQMASK: 0
GPR00: 0000000000000005 00003fffd6440800 00003fff894eff00 00003fffd6440820
GPR04: 0000000000000000 00000000000001a0 0000000000000010 0000000000000000
GPR08: 0000000000000006 00003fffd6440820 00003fffd6440820 0000000000000020
GPR12: 0000000000000030 0000000000000000 00003fff894e7f10 00003fffd6440fa0
GPR16: 000000000000fff1 0000000000000000 00003fff89488350 00003fff89488000
GPR20: 0000000000000001 0000000000000001 00003fffd6440fa0 0000000080001000
GPR24: 000000007fff9000 0000000000010000 00003fffd6440ac0 00003fffd64410c0
GPR28: 00000000ffffffff 00003fffd6440a20 00003fffd64410f0 0000000000000006
NIP [00003fff894c8fe0] 0x3fff894c8fe0
LR [00003fff894bcbdc] 0x3fff894bcbdc
--- interrupt: f40
Reboot[  148.135496306,5] OPAL: Reboot request...
It appears that something is wrong with ld64, but I have no idea what. How should I proceed from here? I wondered about posting in the Gentoo forums, but decided to try here first.

3
User Zone / Unable to boot with NVIDIA GT710 installed
« on: August 20, 2024, 08:12:07 pm »
I have a Blackbird system in which I'd like to install an NVIDIA GT710. However, after installation, boot gets to the Petitboot menu, but then fails as it searches for boot options. The following is logged in the console:
Code: [Select]
[enP4p1s0f0] Configuring with DHCP[    7.117579] Oops: Exception in kernel mode, sig: 5 [#1]
[    7.117744] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[    7.117896] Modules linked in: sd_mod sg xhci_pci xhci_hcd ast ibmpowernv rtc_opal nouveau(+) at24 regmap_i2c drm_shmem_helper usbcore tg3 usb_common drm_exec gpu_sched drm_ttm_helper ahci ttm libahci drm_display_helper backlight mtdblock mtd_blkdevs ofpart powernv_flash mtd
[    7.118470] CPU: 0 PID: 199 Comm: kworker/0:1 Not tainted 6.6.16-openpower1 #4
[    7.118682] Hardware name: C1P9S01 REV 1.02 POWER9 0x4e1203 opal:skiboot-ecb1dc7 PowerNV
[    7.118903] Workqueue: events work_for_cpu_fn
[    7.119114] NIP:  c0000000002beddc LR: c0000000002bedd8 CTR: c0000000005da76c
[    7.119337] REGS: c000000009c0f7c0 TRAP: 0700   Not tainted  (6.6.16-openpower1)
[    7.119579] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28000288  XER: 00000000
[    7.119820] CFAR: c0000000000d1dd4 IRQMASK: 0
[    7.119820] GPR00: c0000000002bedd8 c000000009c0fa60 c000000000819c00 000000000000006d
[    7.119820] GPR04: 00000000ffff7fff c000000009c0f890 0000000000000001 c0000000023e85d8
[    7.119820] GPR08: 0000000ffbb30000 0000000000000027 0000000000000027 6536666461313030
[    7.119820] GPR12: 0000000048000288 c000000fff7ff480 c0000000000aab34 c00000000ad59840
[    7.119820] GPR16: 0000000000000000 0000000000000000 0000000000000000 c00000000330410d
[    7.119820] GPR20: 000000007fffffff c000000006a56db0 c00000000910fa48 61c8864680b583eb
[    7.119820] GPR24: c00000000910fa08 c000000006a56580 c0000000030f26b0 c000000016ff2348
[    7.119820] GPR28: c000000016ffc130 c00000001adf7340 0000000000000000 c00000001adf6e40
[    7.122259] NIP [c0000000002beddc] __list_del_entry_valid_or_report+0xec/0x134
[    7.122558] LR [c0000000002bedd8] __list_del_entry_valid_or_report+0xe8/0x134
[    7.122862] Call Trace:
[    7.123156] [c000000009c0fa60] [c0000000002bedd8] __list_del_entry_valid_or_report+0xe8/0x134 (unreliable)
[    7.123492] [c000000009c0fac0] [c008000001118280] list_del+0x74/0x84 [nouveau]
[    7.123887] [c000000009c0faf0] [c00800000111852c] nvkm_mm_free+0x94/0x144 [nouveau]
[    7.124283] [c000000009c0fb40] [c008000001115acc] nvkm_gpuobj_del+0x44/0x84 [nouveau]
[    7.124697] [c000000009c0fb70] [c00800000111a724] nvkm_ramht_del+0x30/0x58 [nouveau]
[    7.125101] [c000000009c0fba0] [c00800000119280c] nvkm_disp_dtor+0x30/0x1bc [nouveau]
[    7.125542] [c000000009c0fc20] [c0080000011142bc] nvkm_engine_dtor+0x38/0x54 [nouveau]
[    7.125964] [c000000009c0fc40] [c00800000111b13c] nvkm_subdev_del+0xdc/0x154 [nouveau]
[    7.126386] [c000000009c0fcc0] [c00800000118cf40] nvkm_device_del+0x144/0x174 [nouveau]
[    7.126843] [c000000009c0fd20] [c0080000011e47a0] nouveau_drm_probe+0x15c/0x22c [nouveau]
[    7.127313] [c000000009c0fdb0] [c0000000002eb7d0] local_pci_probe+0x3c/0x80
[    7.127702] [c000000009c0fe20] [c00000000009f3b8] work_for_cpu_fn+0x30/0x40
[    7.128099] [c000000009c0fe50] [c0000000000a2d90] process_scheduled_works+0x1d0/0x28c
  [Disk: sdb1 / 4e48a184-7db9-4084-ab58-f08d164df005]
    Trisquel GNU/Linux, with Linux-Libre 5.15.0-117-generic (recovery mode)
    Trisquel GNU/Linux, with Linux-Libre 5.15.0-117-generic
    Trisquel GNU/Linux, with Linux-Libre 5.15.0-118-generic (recovery mode)
    Trisquel GNU/Linux, with Linux-Libre 5.15.0-118-generic
    (*) Trisquel GNU/Linux
[    7.128512] [c000000009c0ff20] [c0000000000a32c8] worker_thread+0x244/0x288
[    7.128923] [c000000009c0ff90] [c0000000000aac24] kthread+0xf8/0x100
[    7.129329] [c000000009c0ffe0] [c00000000000dd58] start_kernel_thread+0x14/0x18
[    7.129753] Code: 4be12fe9 60000000 0fe00000 4bffff74 e9460000 7c2a1840 41820020 3c62fff4 7d455378 386382e7 4be12fc1 60000000 <0fe00000> 4bffff4c e8a90008 7c255040
[    7.130643] ---[ end trace 0000000000000000 ]---
[    7.899186]  sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 sda14 sda15 sda16 sda17 sda18 >
[    7.903465] sd 2:0:0:0:                                                 
[    7.962273] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[    7.963812] device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@redhat.com
 [sdb3] Processing new Disk device[    8.097452] EXT4-fs (dm-2): orphan cleanup on readonly fs
 Booting in 10 sec: [sdb1] Trisquel GNU/Linux[    8.433254]
            9                               [    9.433724] Kernel panic - not syncing: Fatal exception
[   10.[  113.217585323,5] OPAL: Reboot request...
480719] Rebooting in 30 seconds..

I've tried with and without the "disable onboard video" jumper installed. The system boots fine with a Radeon HD5450 installed, and the GT710 works fine in another system. What could be causing this problem?

Pages: [1]