Software > Firmware

Firmware 2.10 for Talos-II and Blackbird available

<< < (4/5) > >>

carlosgonz:
Have you Jumped the ASPEED GPU?

MPC7500:
Has the card worked with the old firmware?

tle:

--- Quote from: r34per on February 29, 2024, 04:40:17 pm ---I may have spoke too soon :( I get no video when I boot into the os, and the boot console spits this out-

--- Code: ---SIGTERM received, booting...
[   99.149386402,3] PHB#0000[0:0]:                  brdgCtl = 00000002
[   99.149481878,3] PHB#0000[0:0]:             deviceStatus = 00000020
[   99.149523997,3] PHB#0000[0:0]:               slotStatus = 00402000
[   99.149618190,3] PHB#0000[0:0]:               linkStatus = a0840008
[   99.149660035,3] PHB#0000[0:0]:             devCmdStatus = 00100107
[   99.149727651,3] PHB#0000[0:0]:             devSecStatus = 00002000
[   99.149774208,3] PHB#0000[0:0]:          rootErrorStatus = 00000000
[   99.149829970,3] PHB#0000[0:0]:          corrErrorStatus = 00000000
[   99.149869722,3] PHB#0000[0:0]:        uncorrErrorStatus = 00000000
[   99.149918442,3] PHB#0000[0:0]:                   devctl = 00000020
[   99.149955380,3] PHB#0000[0:0]:                  devStat = 00000000
[   99.149996897,3] PHB#0000[0:0]:                  tlpHdr1 = 00000000
[   99.150043352,3] PHB#0000[0:0]:                  tlpHdr2 = 00000000
[   99.150096694,3] PHB#0000[0:0]:                  tlpHdr3 = 00000000
[   99.150143000,3] PHB#0000[0:0]:                  tlpHdr4 = 00000000
[   99.150189643,3] PHB#0000[0:0]:                 sourceId = 00000000
[   99.150231444,3] PHB#0000[0:0]:                     nFir = 0000000000000000
[   99.150275820,3] PHB#0000[0:0]:                 nFirMask = 0030001c00000000
[   99.150319837,3] PHB#0000[0:0]:                  nFirWOF = 0000000000000000
[   99.150378022,3] PHB#0000[0:0]:                 phbPlssr = 0000001c00000000
[   99.150433559,3] PHB#0000[0:0]:                   phbCsr = 0000001c00000000
[   99.150489148,3] PHB#0000[0:0]:                   lemFir = 0000000100280000
[   99.150533384,3] PHB#0000[0:0]:             lemErrorMask = 0000000000000000
[   99.150577353,3] PHB#0000[0:0]:                   lemWOF = 0000000100000000
[   99.150621318,3] PHB#0000[0:0]:           phbErrorStatus = 0000088000000000
[   99.150672497,3] PHB#0000[0:0]:      phbFirstErrorStatus = 0000008000000000
[   99.150728026,3] PHB#0000[0:0]:             phbErrorLog0 = 2148000098000240
[   99.150774762,3] PHB#0000[0:0]:             phbErrorLog1 = a008400000000000
[   99.150823696,3] PHB#0000[0:0]:        phbTxeErrorStatus = 0000000000000000
[   99.150872357,3] PHB#0000[0:0]:   phbTxeFirstErrorStatus = 0000000000000000
[   99.150916641,3] PHB#0000[0:0]:          phbTxeErrorLog0 = 0000000000000000
[   99.150965287,3] PHB#0000[0:0]:          phbTxeErrorLog1 = 0000000000000000
[   99.151018775,3] PHB#0000[0:0]:     phbRxeArbErrorStatus = 4000200000000000
[   99.151074489,3] PHB#0000[0:0]: phbRxeArbFrstErrorStatus = 0000200000000000
[   99.151127737,3] PHB#0000[0:0]:       phbRxeArbErrorLog0 = 02409fde30000000
[   99.151171863,3] PHB#0000[0:0]:       phbRxeArbErrorLog1 = 0000000000000000
[   99.151215896,3] PHB#0000[0:0]:     phbRxeMrgErrorStatus = 0000000000000000
[   99.151260084,3] PHB#0000[0:0]: phbRxeMrgFrstErrorStatus = 0000000000000000
[   99.151315450,3] PHB#0000[0:0]:       phbRxeMrgErrorLog0 = 0000000000000000
[   99.151369016,3] PHB#0000[0:0]:       phbRxeMrgErrorLog1 = 0000000000000000
[   99.151424438,3] PHB#0000[0:0]:     phbRxeTceErrorStatus = 0000000000000000
[   99.151471170,3] PHB#0000[0:0]: phbRxeTceFrstErrorStatus = 0000000000000000
[   99.151517918,3] PHB#0000[0:0]:       phbRxeTceErrorLog0 = 0000000000000000
[   99.151561833,3] PHB#0000[0:0]:       phbRxeTceErrorLog1 = 0000000000000000
[   99.151614682,3] PHB#0000[0:0]:        phbPblErrorStatus = 0000000001000000
[   99.151663274,3] PHB#0000[0:0]:   phbPblFirstErrorStatus = 0000000001000000
[   99.151716727,3] PHB#0000[0:0]:          phbPblErrorLog0 = 0000000000000000
[   99.151762796,3] PHB#0000[0:0]:          phbPblErrorLog1 = 0000000000000000
[   99.151813691,3] PHB#0000[0:0]:      phbPcieDlpErrorLog1 = 0000000000000000
[   99.151858094,3] PHB#0000[0:0]:      phbPcieDlpErrorLog2 = 0000000000000000
[   99.151904253,3] PHB#0000[0:0]:    phbPcieDlpErrorStatus = 00be000000000000
[   99.151959774,3] PHB#0000[0:0]:       phbRegbErrorStatus = 0000004000000000
[   99.152015372,3] PHB#0000[0:0]:  phbRegbFirstErrorStatus = 0000004000000000
[   99.152068905,3] PHB#0000[0:0]:         phbRegbErrorLog0 = 8800006c00000000
[   99.152115691,3] PHB#0000[0:0]:         phbRegbErrorLog1 = 0000000007011000
[   99.152162310,3] PHB#0000[0:0]:                PEST[000] = a440002a00000000 8000000000000000
[   99.152218234,3] PHB#0000[0:0]:                PEST[001] = 8000000000000000 8000000000000000
[   99.152285858,3] PHB#0000[0:0]:                PEST[002] = 8000000000000000 8000000000000000
[   99.152350714,3] PHB#0000[0:0]:                PEST[003] = 8000000000000000 8000000000000000
[   99.152414534,3] PHB#0000[0:0]:                PEST[004] = 8000000000000000 8000000000000000
[   99.152474834,3] PHB#0000[0:0]:                PEST[005] = 8000000000000000 8000000000000000
[   99.152528675,3] PHB#0000[0:0]:                PEST[006] = 8000000000000000 8000000000000000
[   99.152589889,3] PHB#0000[0:0]:                PEST[007] = 8000000000000000 8000000000000000
[   99.152657446,3] PHB#0000[0:0]:                PEST[008] = 8000000000000000 8000000000000000
[   99.152720282,3] PHB#0000[0:0]:                PEST[1ff] = 3740002a03000000 0000000000000000
[    3.560406] EEH: Recovering PHB#0-PE#0
[    3.560433] EEH: PE location: UOPWR.D100029-Node0-SLOT1 PCIE 4.0 X16, PHB location: N/A
[    3.560473] EEH: Frozen PHB#0-PE#0 detected
[    3.560486] EEH: Call Trace:
[    3.560526] EEH: [00000000c094f14c] __eeh_send_failure_event+0x7c/0x160
[    3.560585] EEH: [00000000c2fbde4c] eeh_dev_check_failure+0x2c4/0x6a0
[    3.560634] EEH: [00000000eb293b00] amdgpu_device_rreg.part.0+0x160/0x1f0 [amdgpu]
[    3.560924] EEH: [0000000009854edf] psp_wait_for+0xac/0x130 [amdgpu]
[    3.561223] EEH: [0000000006086f20] psp_v11_0_mode1_reset+0xbc/0x130 [amdgpu]
[    3.561554] EEH: [00000000927ca5cd] psp_gpu_reset+0x88/0xd0 [amdgpu]
[    3.561868] EEH: [000000000d948d66] amdgpu_device_mode1_reset+0x148/0x180 [amdgpu]
[    3.562116] EEH: [00000000d607b75f] nv_asic_reset+0xbc/0x290 [amdgpu]
[    3.562414] EEH: [00000000893f34f2] amdgpu_device_init+0x172c/0x2300 [amdgpu]
[    3.562693] EEH: [00000000d8547fbc] amdgpu_driver_load_kms+0x30/0x1e0 [amdgpu]
[    3.562966] EEH: [000000008c9f0b1b] amdgpu_pci_probe+0x1f0/0x540 [amdgpu]
[    3.563210] EEH: [0000000067c06d95] local_pci_probe+0x68/0x110
[    3.563250] EEH: [000000004224f0ca] work_for_cpu_fn+0x38/0x60
[    3.563290] EEH: [00000000c5105116] process_one_work+0x2a4/0x570
[    3.563332] EEH: [00000000f81a86b6] worker_thread+0x280/0x5b0
[    3.563372] EEH: [00000000bf39fc31] kthread+0x120/0x130
[    3.563409] EEH: [0000000036d034ff] ret_from_kernel_thread+0x5c/0x64
[    3.852813] kernel BUG at drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c:593!
[    3.852840] Oops: Exception in kernel mode, sig: 5 [#1]
[    3.852856] LE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[    3.852884] Modules linked in: uas usb_storage sd_mod amdgpu(+) gpu_sched drm_buddy i2c_algo_bit drm_display_helper cec rc_core drm_ttm_helper ttm drm_kms_helper xhci_pci xhci_pci_renesas syscopyarea sysfillrect ahci sysimgblt fb_sys_fops libahci xhci_hcd libata drm vmx_crypto gf128mul usbcore scsi_mod drm_panel_orientation_quirks usb_common scsi_common agpgart dm_mirror dm_region_hash dm_log dm_mod btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic crc32c_vpmsum
[    3.853130] CPU: 0 PID: 23 Comm: kworker/0:0 Not tainted 6.0.13_1 #1
[    3.853162] Workqueue: events work_for_cpu_fn
[    3.853201] NIP:  c008000002cbb648 LR: c008000002c3cb50 CTR: c008000002cbb5f8
[    3.853241] REGS: c000000002527500 TRAP: 0700   Not tainted  (6.0.13_1)
[    3.853288] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002248  XER: 20040000
[    3.853339] CFAR: c008000002cbb6dc IRQMASK: 0
[    3.853339] GPR00: c008000002c3cb50 c0000000025277a0 c0080000033b1000 00feffffff900000
[    3.853339] GPR04: 00feffffff900000 c000000002527858 c000000002527860 c0000000024f0000
[    3.853339] GPR08: 0000000000000001 00fe000000000000 0040000000000002 c00800000318aef0
[    3.853339] GPR12: c008000002cbb5f8 c0000003ff7ef600 c000000016e86070 c000000016e86078
[    3.853339] GPR16: c000000016e86068 c000000016e98338 c000000016e86088 c000000016e86090
[    3.853339] GPR20: c000000016e86080 c008000003430dcc 0000000000000100 c000000016e97250
[    3.853339] GPR24: 0000000000000001 c0080000033c5dd0 c000000016e80000 c000000016e85208
[    3.853339] GPR28: c000000016e80000 ffffffffffffffff c000000002527860 c000000002527860
[    3.853711] NIP [c008000002cbb648] gmc_v10_0_get_vm_pde+0x50/0x120 [amdgpu]
[    3.854018] LR [c008000002c3cb50] amdgpu_gmc_get_pde_for_bo+0xa8/0x110 [amdgpu]
[    3.854326] Call Trace:
[    3.854348] [c0000000025277a0] [c0000000025277e0] 0xc0000000025277e0 (unreliable)
[    3.854389] [c0000000025277e0] [c008000002c3cb50] amdgpu_gmc_get_pde_for_bo+0xa8/0x110 [amdgpu]
[    3.854699] [c000000002527830] [c008000002c3cc08] amdgpu_gmc_pd_addr+0x50/0xa8 [amdgpu]
[    3.855008] [c000000002527870] [c008000002cb7b30] gfxhub_v2_0_gart_enable+0x48/0x11f0 [amdgpu]
[    3.855325] [c0000000025278d0] [c008000002cbce30] gmc_v10_0_hw_init+0x88/0x270 [amdgpu]
[    3.855651] [c000000002527960] [c008000002be4a9c] amdgpu_device_init+0x1ee4/0x2300 [amdgpu]
[    3.855968] [c000000002527ac0] [c008000002be6758] amdgpu_driver_load_kms+0x30/0x1e0 [amdgpu]
[    3.856240] [c000000002527b40] [c008000002bdae68] amdgpu_pci_probe+0x1f0/0x540 [amdgpu]
[    3.856532] [c000000002527be0] [c0000000008d6078] local_pci_probe+0x68/0x110
[    3.856583] [c000000002527c60] [c00000000017f5b8] work_for_cpu_fn+0x38/0x60
[    3.856634] [c000000002527c90] [c000000000184ee4] process_one_work+0x2a4/0x570
[    3.856684] [c000000002527d30] [c000000000185a30] worker_thread+0x280/0x5b0
[    3.856725] [c000000002527dc0] [c000000000191a70] kthread+0x120/0x130
[    3.856765] [c000000002527e10] [c00000000000cecc] ret_from_kernel_thread+0x5c/0x64
[    3.856807] Instruction dump:
[    3.856829] 7c7c1b78 fbe1fff8 7c9d2378 f821ffc1 e8850000 794a07c6 7cdf3378 614a0002
[    3.856876] 7d095039 41820074 788982a0 79298002 <0b090000> 893c0d44 2c090000 41820014
[    3.856935] ---[ end trace 0000000000000000 ]---
--- End code ---

fast reboot is disabled, and these were the firmware files I used-

--- Code: ---navi10_asd.bin     navi14_gpu_info.bin  navi14_me_wks.bin   navi14_smc.bin
navi10_ta.bin      navi14_me.bin        navi14_pfp.bin      navi14_sos.bin
navi10_vcn.bin     navi14_mec2.bin      navi14_pfp_wks.bin  navi14_ta.bin
navi14_asd.bin     navi14_mec2_wks.bin  navi14_rlc.bin      navi14_vcn.bin
navi14_ce.bin      navi14_mec.bin       navi14_sdma1.bin
navi14_ce_wks.bin  navi14_mec_wks.bin   navi14_sdma.bin
--- End code ---

I tried all manner of combinations of the navi firmware and the ones that did give me video in petitboot would throw the same error.

--- End quote ---


what is the linux kernel version of firmware 2.10? Some cards are known to be buggy with old kernel, please refer to https://wiki.raptorcs.com/wiki/POWER9_Hardware_Compatibility_List/PCIe_Devices for compatability

tle:
Folks, any idea where could I find 2.10 changes in git repo?

I was looking into https://git.raptorcs.com/git/ but unable to find any changes in blackbird-* that are related

atomicdog:
Their gitlab repo has more recent changes so I'm guessing that's where the 2.10 version is.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version