Author Topic: Firmware 2.10 for Talos-II and Blackbird available  (Read 14431 times)

carlosgonz

  • Newbie
  • *
  • Posts: 9
  • Karma: +0/-0
    • View Profile
Re: Firmware 2.10 for Talos-II and Blackbird available
« Reply #15 on: February 29, 2024, 08:32:01 pm »
Have you Jumped the ASPEED GPU?
Blackbird  Rv1.02

MPC7500

  • Hero Member
  • *****
  • Posts: 596
  • Karma: +42/-1
    • View Profile
    • Twitter
Re: Firmware 2.10 for Talos-II and Blackbird available
« Reply #16 on: March 01, 2024, 05:42:55 am »
Has the card worked with the old firmware?

tle

  • Sr. Member
  • ****
  • Posts: 467
  • Karma: +53/-0
    • View Profile
    • Trung's Personal Website
Re: Firmware 2.10 for Talos-II and Blackbird available
« Reply #17 on: March 17, 2024, 05:30:54 pm »
I may have spoke too soon :( I get no video when I boot into the os, and the boot console spits this out-
Code: [Select]
SIGTERM received, booting...
[   99.149386402,3] PHB#0000[0:0]:                  brdgCtl = 00000002
[   99.149481878,3] PHB#0000[0:0]:             deviceStatus = 00000020
[   99.149523997,3] PHB#0000[0:0]:               slotStatus = 00402000
[   99.149618190,3] PHB#0000[0:0]:               linkStatus = a0840008
[   99.149660035,3] PHB#0000[0:0]:             devCmdStatus = 00100107
[   99.149727651,3] PHB#0000[0:0]:             devSecStatus = 00002000
[   99.149774208,3] PHB#0000[0:0]:          rootErrorStatus = 00000000
[   99.149829970,3] PHB#0000[0:0]:          corrErrorStatus = 00000000
[   99.149869722,3] PHB#0000[0:0]:        uncorrErrorStatus = 00000000
[   99.149918442,3] PHB#0000[0:0]:                   devctl = 00000020
[   99.149955380,3] PHB#0000[0:0]:                  devStat = 00000000
[   99.149996897,3] PHB#0000[0:0]:                  tlpHdr1 = 00000000
[   99.150043352,3] PHB#0000[0:0]:                  tlpHdr2 = 00000000
[   99.150096694,3] PHB#0000[0:0]:                  tlpHdr3 = 00000000
[   99.150143000,3] PHB#0000[0:0]:                  tlpHdr4 = 00000000
[   99.150189643,3] PHB#0000[0:0]:                 sourceId = 00000000
[   99.150231444,3] PHB#0000[0:0]:                     nFir = 0000000000000000
[   99.150275820,3] PHB#0000[0:0]:                 nFirMask = 0030001c00000000
[   99.150319837,3] PHB#0000[0:0]:                  nFirWOF = 0000000000000000
[   99.150378022,3] PHB#0000[0:0]:                 phbPlssr = 0000001c00000000
[   99.150433559,3] PHB#0000[0:0]:                   phbCsr = 0000001c00000000
[   99.150489148,3] PHB#0000[0:0]:                   lemFir = 0000000100280000
[   99.150533384,3] PHB#0000[0:0]:             lemErrorMask = 0000000000000000
[   99.150577353,3] PHB#0000[0:0]:                   lemWOF = 0000000100000000
[   99.150621318,3] PHB#0000[0:0]:           phbErrorStatus = 0000088000000000
[   99.150672497,3] PHB#0000[0:0]:      phbFirstErrorStatus = 0000008000000000
[   99.150728026,3] PHB#0000[0:0]:             phbErrorLog0 = 2148000098000240
[   99.150774762,3] PHB#0000[0:0]:             phbErrorLog1 = a008400000000000
[   99.150823696,3] PHB#0000[0:0]:        phbTxeErrorStatus = 0000000000000000
[   99.150872357,3] PHB#0000[0:0]:   phbTxeFirstErrorStatus = 0000000000000000
[   99.150916641,3] PHB#0000[0:0]:          phbTxeErrorLog0 = 0000000000000000
[   99.150965287,3] PHB#0000[0:0]:          phbTxeErrorLog1 = 0000000000000000
[   99.151018775,3] PHB#0000[0:0]:     phbRxeArbErrorStatus = 4000200000000000
[   99.151074489,3] PHB#0000[0:0]: phbRxeArbFrstErrorStatus = 0000200000000000
[   99.151127737,3] PHB#0000[0:0]:       phbRxeArbErrorLog0 = 02409fde30000000
[   99.151171863,3] PHB#0000[0:0]:       phbRxeArbErrorLog1 = 0000000000000000
[   99.151215896,3] PHB#0000[0:0]:     phbRxeMrgErrorStatus = 0000000000000000
[   99.151260084,3] PHB#0000[0:0]: phbRxeMrgFrstErrorStatus = 0000000000000000
[   99.151315450,3] PHB#0000[0:0]:       phbRxeMrgErrorLog0 = 0000000000000000
[   99.151369016,3] PHB#0000[0:0]:       phbRxeMrgErrorLog1 = 0000000000000000
[   99.151424438,3] PHB#0000[0:0]:     phbRxeTceErrorStatus = 0000000000000000
[   99.151471170,3] PHB#0000[0:0]: phbRxeTceFrstErrorStatus = 0000000000000000
[   99.151517918,3] PHB#0000[0:0]:       phbRxeTceErrorLog0 = 0000000000000000
[   99.151561833,3] PHB#0000[0:0]:       phbRxeTceErrorLog1 = 0000000000000000
[   99.151614682,3] PHB#0000[0:0]:        phbPblErrorStatus = 0000000001000000
[   99.151663274,3] PHB#0000[0:0]:   phbPblFirstErrorStatus = 0000000001000000
[   99.151716727,3] PHB#0000[0:0]:          phbPblErrorLog0 = 0000000000000000
[   99.151762796,3] PHB#0000[0:0]:          phbPblErrorLog1 = 0000000000000000
[   99.151813691,3] PHB#0000[0:0]:      phbPcieDlpErrorLog1 = 0000000000000000
[   99.151858094,3] PHB#0000[0:0]:      phbPcieDlpErrorLog2 = 0000000000000000
[   99.151904253,3] PHB#0000[0:0]:    phbPcieDlpErrorStatus = 00be000000000000
[   99.151959774,3] PHB#0000[0:0]:       phbRegbErrorStatus = 0000004000000000
[   99.152015372,3] PHB#0000[0:0]:  phbRegbFirstErrorStatus = 0000004000000000
[   99.152068905,3] PHB#0000[0:0]:         phbRegbErrorLog0 = 8800006c00000000
[   99.152115691,3] PHB#0000[0:0]:         phbRegbErrorLog1 = 0000000007011000
[   99.152162310,3] PHB#0000[0:0]:                PEST[000] = a440002a00000000 8000000000000000
[   99.152218234,3] PHB#0000[0:0]:                PEST[001] = 8000000000000000 8000000000000000
[   99.152285858,3] PHB#0000[0:0]:                PEST[002] = 8000000000000000 8000000000000000
[   99.152350714,3] PHB#0000[0:0]:                PEST[003] = 8000000000000000 8000000000000000
[   99.152414534,3] PHB#0000[0:0]:                PEST[004] = 8000000000000000 8000000000000000
[   99.152474834,3] PHB#0000[0:0]:                PEST[005] = 8000000000000000 8000000000000000
[   99.152528675,3] PHB#0000[0:0]:                PEST[006] = 8000000000000000 8000000000000000
[   99.152589889,3] PHB#0000[0:0]:                PEST[007] = 8000000000000000 8000000000000000
[   99.152657446,3] PHB#0000[0:0]:                PEST[008] = 8000000000000000 8000000000000000
[   99.152720282,3] PHB#0000[0:0]:                PEST[1ff] = 3740002a03000000 0000000000000000
[    3.560406] EEH: Recovering PHB#0-PE#0
[    3.560433] EEH: PE location: UOPWR.D100029-Node0-SLOT1 PCIE 4.0 X16, PHB location: N/A
[    3.560473] EEH: Frozen PHB#0-PE#0 detected
[    3.560486] EEH: Call Trace:
[    3.560526] EEH: [00000000c094f14c] __eeh_send_failure_event+0x7c/0x160
[    3.560585] EEH: [00000000c2fbde4c] eeh_dev_check_failure+0x2c4/0x6a0
[    3.560634] EEH: [00000000eb293b00] amdgpu_device_rreg.part.0+0x160/0x1f0 [amdgpu]
[    3.560924] EEH: [0000000009854edf] psp_wait_for+0xac/0x130 [amdgpu]
[    3.561223] EEH: [0000000006086f20] psp_v11_0_mode1_reset+0xbc/0x130 [amdgpu]
[    3.561554] EEH: [00000000927ca5cd] psp_gpu_reset+0x88/0xd0 [amdgpu]
[    3.561868] EEH: [000000000d948d66] amdgpu_device_mode1_reset+0x148/0x180 [amdgpu]
[    3.562116] EEH: [00000000d607b75f] nv_asic_reset+0xbc/0x290 [amdgpu]
[    3.562414] EEH: [00000000893f34f2] amdgpu_device_init+0x172c/0x2300 [amdgpu]
[    3.562693] EEH: [00000000d8547fbc] amdgpu_driver_load_kms+0x30/0x1e0 [amdgpu]
[    3.562966] EEH: [000000008c9f0b1b] amdgpu_pci_probe+0x1f0/0x540 [amdgpu]
[    3.563210] EEH: [0000000067c06d95] local_pci_probe+0x68/0x110
[    3.563250] EEH: [000000004224f0ca] work_for_cpu_fn+0x38/0x60
[    3.563290] EEH: [00000000c5105116] process_one_work+0x2a4/0x570
[    3.563332] EEH: [00000000f81a86b6] worker_thread+0x280/0x5b0
[    3.563372] EEH: [00000000bf39fc31] kthread+0x120/0x130
[    3.563409] EEH: [0000000036d034ff] ret_from_kernel_thread+0x5c/0x64
[    3.852813] kernel BUG at drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c:593!
[    3.852840] Oops: Exception in kernel mode, sig: 5 [#1]
[    3.852856] LE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[    3.852884] Modules linked in: uas usb_storage sd_mod amdgpu(+) gpu_sched drm_buddy i2c_algo_bit drm_display_helper cec rc_core drm_ttm_helper ttm drm_kms_helper xhci_pci xhci_pci_renesas syscopyarea sysfillrect ahci sysimgblt fb_sys_fops libahci xhci_hcd libata drm vmx_crypto gf128mul usbcore scsi_mod drm_panel_orientation_quirks usb_common scsi_common agpgart dm_mirror dm_region_hash dm_log dm_mod btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic crc32c_vpmsum
[    3.853130] CPU: 0 PID: 23 Comm: kworker/0:0 Not tainted 6.0.13_1 #1
[    3.853162] Workqueue: events work_for_cpu_fn
[    3.853201] NIP:  c008000002cbb648 LR: c008000002c3cb50 CTR: c008000002cbb5f8
[    3.853241] REGS: c000000002527500 TRAP: 0700   Not tainted  (6.0.13_1)
[    3.853288] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002248  XER: 20040000
[    3.853339] CFAR: c008000002cbb6dc IRQMASK: 0
[    3.853339] GPR00: c008000002c3cb50 c0000000025277a0 c0080000033b1000 00feffffff900000
[    3.853339] GPR04: 00feffffff900000 c000000002527858 c000000002527860 c0000000024f0000
[    3.853339] GPR08: 0000000000000001 00fe000000000000 0040000000000002 c00800000318aef0
[    3.853339] GPR12: c008000002cbb5f8 c0000003ff7ef600 c000000016e86070 c000000016e86078
[    3.853339] GPR16: c000000016e86068 c000000016e98338 c000000016e86088 c000000016e86090
[    3.853339] GPR20: c000000016e86080 c008000003430dcc 0000000000000100 c000000016e97250
[    3.853339] GPR24: 0000000000000001 c0080000033c5dd0 c000000016e80000 c000000016e85208
[    3.853339] GPR28: c000000016e80000 ffffffffffffffff c000000002527860 c000000002527860
[    3.853711] NIP [c008000002cbb648] gmc_v10_0_get_vm_pde+0x50/0x120 [amdgpu]
[    3.854018] LR [c008000002c3cb50] amdgpu_gmc_get_pde_for_bo+0xa8/0x110 [amdgpu]
[    3.854326] Call Trace:
[    3.854348] [c0000000025277a0] [c0000000025277e0] 0xc0000000025277e0 (unreliable)
[    3.854389] [c0000000025277e0] [c008000002c3cb50] amdgpu_gmc_get_pde_for_bo+0xa8/0x110 [amdgpu]
[    3.854699] [c000000002527830] [c008000002c3cc08] amdgpu_gmc_pd_addr+0x50/0xa8 [amdgpu]
[    3.855008] [c000000002527870] [c008000002cb7b30] gfxhub_v2_0_gart_enable+0x48/0x11f0 [amdgpu]
[    3.855325] [c0000000025278d0] [c008000002cbce30] gmc_v10_0_hw_init+0x88/0x270 [amdgpu]
[    3.855651] [c000000002527960] [c008000002be4a9c] amdgpu_device_init+0x1ee4/0x2300 [amdgpu]
[    3.855968] [c000000002527ac0] [c008000002be6758] amdgpu_driver_load_kms+0x30/0x1e0 [amdgpu]
[    3.856240] [c000000002527b40] [c008000002bdae68] amdgpu_pci_probe+0x1f0/0x540 [amdgpu]
[    3.856532] [c000000002527be0] [c0000000008d6078] local_pci_probe+0x68/0x110
[    3.856583] [c000000002527c60] [c00000000017f5b8] work_for_cpu_fn+0x38/0x60
[    3.856634] [c000000002527c90] [c000000000184ee4] process_one_work+0x2a4/0x570
[    3.856684] [c000000002527d30] [c000000000185a30] worker_thread+0x280/0x5b0
[    3.856725] [c000000002527dc0] [c000000000191a70] kthread+0x120/0x130
[    3.856765] [c000000002527e10] [c00000000000cecc] ret_from_kernel_thread+0x5c/0x64
[    3.856807] Instruction dump:
[    3.856829] 7c7c1b78 fbe1fff8 7c9d2378 f821ffc1 e8850000 794a07c6 7cdf3378 614a0002
[    3.856876] 7d095039 41820074 788982a0 79298002 <0b090000> 893c0d44 2c090000 41820014
[    3.856935] ---[ end trace 0000000000000000 ]---

fast reboot is disabled, and these were the firmware files I used-
Code: [Select]
navi10_asd.bin     navi14_gpu_info.bin  navi14_me_wks.bin   navi14_smc.bin
navi10_ta.bin      navi14_me.bin        navi14_pfp.bin      navi14_sos.bin
navi10_vcn.bin     navi14_mec2.bin      navi14_pfp_wks.bin  navi14_ta.bin
navi14_asd.bin     navi14_mec2_wks.bin  navi14_rlc.bin      navi14_vcn.bin
navi14_ce.bin      navi14_mec.bin       navi14_sdma1.bin
navi14_ce_wks.bin  navi14_mec_wks.bin   navi14_sdma.bin

I tried all manner of combinations of the navi firmware and the ones that did give me video in petitboot would throw the same error.


what is the linux kernel version of firmware 2.10? Some cards are known to be buggy with old kernel, please refer to https://wiki.raptorcs.com/wiki/POWER9_Hardware_Compatibility_List/PCIe_Devices for compatability
Faithful Linux enthusiast

My Raptor Blackbird

tle

  • Sr. Member
  • ****
  • Posts: 467
  • Karma: +53/-0
    • View Profile
    • Trung's Personal Website
Re: Firmware 2.10 for Talos-II and Blackbird available
« Reply #18 on: March 21, 2024, 09:16:29 pm »
Folks, any idea where could I find 2.10 changes in git repo?

I was looking into https://git.raptorcs.com/git/ but unable to find any changes in blackbird-* that are related
Faithful Linux enthusiast

My Raptor Blackbird

atomicdog

  • Newbie
  • *
  • Posts: 43
  • Karma: +4/-0
    • View Profile
Re: Firmware 2.10 for Talos-II and Blackbird available
« Reply #19 on: March 21, 2024, 10:17:00 pm »
Their gitlab repo has more recent changes so I'm guessing that's where the 2.10 version is.

Borley

  • Full Member
  • ***
  • Posts: 181
  • Karma: +17/-0
    • View Profile
Re: Firmware 2.10 for Talos-II and Blackbird available
« Reply #20 on: March 23, 2024, 09:52:03 pm »
what is the linux kernel version of firmware 2.10? Some cards are known to be buggy with old kernel, please refer to https://wiki.raptorcs.com/wiki/POWER9_Hardware_Compatibility_List/PCIe_Devices for compatability

6.6 I think. The new firmware is up and running on my evaluation system but I'm not loading firmwares as r34per is trying to do.