Author Topic: Unable to boot with NVIDIA GT710 installed  (Read 1907 times)

bernie

  • Newbie
  • *
  • Posts: 5
  • Karma: +1/-0
    • View Profile
Unable to boot with NVIDIA GT710 installed
« on: August 20, 2024, 08:12:07 pm »
I have a Blackbird system in which I'd like to install an NVIDIA GT710. However, after installation, boot gets to the Petitboot menu, but then fails as it searches for boot options. The following is logged in the console:
Code: [Select]
[enP4p1s0f0] Configuring with DHCP[    7.117579] Oops: Exception in kernel mode, sig: 5 [#1]
[    7.117744] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[    7.117896] Modules linked in: sd_mod sg xhci_pci xhci_hcd ast ibmpowernv rtc_opal nouveau(+) at24 regmap_i2c drm_shmem_helper usbcore tg3 usb_common drm_exec gpu_sched drm_ttm_helper ahci ttm libahci drm_display_helper backlight mtdblock mtd_blkdevs ofpart powernv_flash mtd
[    7.118470] CPU: 0 PID: 199 Comm: kworker/0:1 Not tainted 6.6.16-openpower1 #4
[    7.118682] Hardware name: C1P9S01 REV 1.02 POWER9 0x4e1203 opal:skiboot-ecb1dc7 PowerNV
[    7.118903] Workqueue: events work_for_cpu_fn
[    7.119114] NIP:  c0000000002beddc LR: c0000000002bedd8 CTR: c0000000005da76c
[    7.119337] REGS: c000000009c0f7c0 TRAP: 0700   Not tainted  (6.6.16-openpower1)
[    7.119579] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28000288  XER: 00000000
[    7.119820] CFAR: c0000000000d1dd4 IRQMASK: 0
[    7.119820] GPR00: c0000000002bedd8 c000000009c0fa60 c000000000819c00 000000000000006d
[    7.119820] GPR04: 00000000ffff7fff c000000009c0f890 0000000000000001 c0000000023e85d8
[    7.119820] GPR08: 0000000ffbb30000 0000000000000027 0000000000000027 6536666461313030
[    7.119820] GPR12: 0000000048000288 c000000fff7ff480 c0000000000aab34 c00000000ad59840
[    7.119820] GPR16: 0000000000000000 0000000000000000 0000000000000000 c00000000330410d
[    7.119820] GPR20: 000000007fffffff c000000006a56db0 c00000000910fa48 61c8864680b583eb
[    7.119820] GPR24: c00000000910fa08 c000000006a56580 c0000000030f26b0 c000000016ff2348
[    7.119820] GPR28: c000000016ffc130 c00000001adf7340 0000000000000000 c00000001adf6e40
[    7.122259] NIP [c0000000002beddc] __list_del_entry_valid_or_report+0xec/0x134
[    7.122558] LR [c0000000002bedd8] __list_del_entry_valid_or_report+0xe8/0x134
[    7.122862] Call Trace:
[    7.123156] [c000000009c0fa60] [c0000000002bedd8] __list_del_entry_valid_or_report+0xe8/0x134 (unreliable)
[    7.123492] [c000000009c0fac0] [c008000001118280] list_del+0x74/0x84 [nouveau]
[    7.123887] [c000000009c0faf0] [c00800000111852c] nvkm_mm_free+0x94/0x144 [nouveau]
[    7.124283] [c000000009c0fb40] [c008000001115acc] nvkm_gpuobj_del+0x44/0x84 [nouveau]
[    7.124697] [c000000009c0fb70] [c00800000111a724] nvkm_ramht_del+0x30/0x58 [nouveau]
[    7.125101] [c000000009c0fba0] [c00800000119280c] nvkm_disp_dtor+0x30/0x1bc [nouveau]
[    7.125542] [c000000009c0fc20] [c0080000011142bc] nvkm_engine_dtor+0x38/0x54 [nouveau]
[    7.125964] [c000000009c0fc40] [c00800000111b13c] nvkm_subdev_del+0xdc/0x154 [nouveau]
[    7.126386] [c000000009c0fcc0] [c00800000118cf40] nvkm_device_del+0x144/0x174 [nouveau]
[    7.126843] [c000000009c0fd20] [c0080000011e47a0] nouveau_drm_probe+0x15c/0x22c [nouveau]
[    7.127313] [c000000009c0fdb0] [c0000000002eb7d0] local_pci_probe+0x3c/0x80
[    7.127702] [c000000009c0fe20] [c00000000009f3b8] work_for_cpu_fn+0x30/0x40
[    7.128099] [c000000009c0fe50] [c0000000000a2d90] process_scheduled_works+0x1d0/0x28c
  [Disk: sdb1 / 4e48a184-7db9-4084-ab58-f08d164df005]
    Trisquel GNU/Linux, with Linux-Libre 5.15.0-117-generic (recovery mode)
    Trisquel GNU/Linux, with Linux-Libre 5.15.0-117-generic
    Trisquel GNU/Linux, with Linux-Libre 5.15.0-118-generic (recovery mode)
    Trisquel GNU/Linux, with Linux-Libre 5.15.0-118-generic
    (*) Trisquel GNU/Linux
[    7.128512] [c000000009c0ff20] [c0000000000a32c8] worker_thread+0x244/0x288
[    7.128923] [c000000009c0ff90] [c0000000000aac24] kthread+0xf8/0x100
[    7.129329] [c000000009c0ffe0] [c00000000000dd58] start_kernel_thread+0x14/0x18
[    7.129753] Code: 4be12fe9 60000000 0fe00000 4bffff74 e9460000 7c2a1840 41820020 3c62fff4 7d455378 386382e7 4be12fc1 60000000 <0fe00000> 4bffff4c e8a90008 7c255040
[    7.130643] ---[ end trace 0000000000000000 ]---
[    7.899186]  sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 sda8 sda9 sda10 sda11 sda12 sda13 sda14 sda15 sda16 sda17 sda18 >
[    7.903465] sd 2:0:0:0:                                                 
[    7.962273] device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
[    7.963812] device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@redhat.com
 [sdb3] Processing new Disk device[    8.097452] EXT4-fs (dm-2): orphan cleanup on readonly fs
 Booting in 10 sec: [sdb1] Trisquel GNU/Linux[    8.433254]
            9                               [    9.433724] Kernel panic - not syncing: Fatal exception
[   10.[  113.217585323,5] OPAL: Reboot request...
480719] Rebooting in 30 seconds..

I've tried with and without the "disable onboard video" jumper installed. The system boots fine with a Radeon HD5450 installed, and the GT710 works fine in another system. What could be causing this problem?

Hasturtium

  • Full Member
  • ***
  • Posts: 155
  • Karma: +10/-0
    • View Profile
Re: Unable to boot with NVIDIA GT710 installed
« Reply #1 on: August 21, 2024, 12:35:07 pm »
It looks like nouveau is barfing, and it has a history of not working with non-4KB kernel page sizes. The developers do not show any indication that they want to fix that. As you're running with a 64KB page size, that is likely to cause problems. See if you can manage to blacklist nouveau and proceed with the boot sequence.

bernie

  • Newbie
  • *
  • Posts: 5
  • Karma: +1/-0
    • View Profile
Re: Unable to boot with NVIDIA GT710 installed
« Reply #2 on: August 21, 2024, 06:41:04 pm »
Thanks for your suggestion. I know how to blacklist a module in an operating system, but not for the boot firmware. Can you provide any reference or tips? I have compiled and installed the OpenPOWER firmware (I was unable to build the OpenBMC firmware), do I need to make some change to the configuration there and rebuild?

Note: the system also boots with an NVIDIA GT1030. I was wondering if the issue was firmware, but the GT1030 requires firmware for the OS to use it, whereas the GT710 does not. Or, at least, it doesn't require proprietary firmware, as it works just fine in Trisquel and Parabola.

bernie

  • Newbie
  • *
  • Posts: 5
  • Karma: +1/-0
    • View Profile
Re: Unable to boot with NVIDIA GT710 installed
« Reply #3 on: August 25, 2024, 06:02:42 pm »
Ok, I figured it out... Documenting here in case someone (particularly my self) needs it in the future.
Blacklist kernel module at boot time in nvram by:
1. Exiting to Petitboot shell from Petitiboot menu
2. Executing nvram -p ibm,skiboot --update-config bootargs="modprobe.blacklist=nouveau"
3. Check config by executing nvram -p ibm,skiboot --print-config
3. Reboot
I don't know if this will persist after a firmware update, is there a way to make this permanent?

This resolved the issue with Petitboot, although when the OS boots, it can't initialise the card. The card works fine in my other desktop machine (amd64), but doesn't work in either Trsiquel, Fedora or Debian in my Blackbird system. In all of the distros, Xorg log shows:
Code: [Select]
[   139.646] (EE) [drm] Failed to open DRM device for pci:0000:01:00.0: -19
[   139.647] (EE) open /dev/dri/card0: No such file or directory

Any clue as to what the solution might be for this issue? I don't believe it's firmware, I specifically bought this card because it doesn't require proprietary firmware, there's no mention of missing firmware in the Xorg log, and nonfree firmware is installed in Fedora and Debian.

dr.chinme

  • Newbie
  • *
  • Posts: 1
  • Karma: +0/-0
    • View Profile
Re: Unable to boot with NVIDIA GT710 installed
« Reply #4 on: December 02, 2024, 09:14:23 pm »
Hi Bernie,

Did you ever get the GT710 working in the OS on your Blackbird? I've installed an NVIDIA Quadro K620 on mine, and it works in Wayland on Debian Testing (which starting using 4k page sizes with 6.10—maybe they integrated Daniel Pocock's patch?).

Unfortunately, after updating to kernel 6.11.9, the below errors appear after a reboot, and the OS won't load until I force a shutdown in the BMC and power it back on again.

Code: [Select]
nouveau 0000:01:00.0: disp: chid 0 stat 00001000 reason 1 [PUSHBUFFER_ERR] mthd 0000 data 00000400 code 00000002
nouveau 0000:01:00.0: DRM: core caps notifier timeout
nouveau 0000:01:00.0: disp: chid 1 stat 00001000 reason 1 [PUSHBUFFER_ERR] mthd 0000 data 00000400 code 00000002
nouveau 0000:01:00.0: DRM: core notifier timeout
nouveau 0000:01:00.0: DRM: base-0: timeout

In your other post, it sounds like compiling a kernel for POWER9 with 4k page sizes and VSX support should work, but I'm hoping to confirm this before I try it.