Recent Posts

Pages: 1 ... 6 7 [8] 9 10
71
Firmware / Re: Firmware 2.10 for Talos-II and Blackbird available
« Last post by r34per on February 29, 2024, 04:40:17 pm »
I may have spoke too soon :( I get no video when I boot into the os, and the boot console spits this out-
Code: [Select]
SIGTERM received, booting...
[   99.149386402,3] PHB#0000[0:0]:                  brdgCtl = 00000002
[   99.149481878,3] PHB#0000[0:0]:             deviceStatus = 00000020
[   99.149523997,3] PHB#0000[0:0]:               slotStatus = 00402000
[   99.149618190,3] PHB#0000[0:0]:               linkStatus = a0840008
[   99.149660035,3] PHB#0000[0:0]:             devCmdStatus = 00100107
[   99.149727651,3] PHB#0000[0:0]:             devSecStatus = 00002000
[   99.149774208,3] PHB#0000[0:0]:          rootErrorStatus = 00000000
[   99.149829970,3] PHB#0000[0:0]:          corrErrorStatus = 00000000
[   99.149869722,3] PHB#0000[0:0]:        uncorrErrorStatus = 00000000
[   99.149918442,3] PHB#0000[0:0]:                   devctl = 00000020
[   99.149955380,3] PHB#0000[0:0]:                  devStat = 00000000
[   99.149996897,3] PHB#0000[0:0]:                  tlpHdr1 = 00000000
[   99.150043352,3] PHB#0000[0:0]:                  tlpHdr2 = 00000000
[   99.150096694,3] PHB#0000[0:0]:                  tlpHdr3 = 00000000
[   99.150143000,3] PHB#0000[0:0]:                  tlpHdr4 = 00000000
[   99.150189643,3] PHB#0000[0:0]:                 sourceId = 00000000
[   99.150231444,3] PHB#0000[0:0]:                     nFir = 0000000000000000
[   99.150275820,3] PHB#0000[0:0]:                 nFirMask = 0030001c00000000
[   99.150319837,3] PHB#0000[0:0]:                  nFirWOF = 0000000000000000
[   99.150378022,3] PHB#0000[0:0]:                 phbPlssr = 0000001c00000000
[   99.150433559,3] PHB#0000[0:0]:                   phbCsr = 0000001c00000000
[   99.150489148,3] PHB#0000[0:0]:                   lemFir = 0000000100280000
[   99.150533384,3] PHB#0000[0:0]:             lemErrorMask = 0000000000000000
[   99.150577353,3] PHB#0000[0:0]:                   lemWOF = 0000000100000000
[   99.150621318,3] PHB#0000[0:0]:           phbErrorStatus = 0000088000000000
[   99.150672497,3] PHB#0000[0:0]:      phbFirstErrorStatus = 0000008000000000
[   99.150728026,3] PHB#0000[0:0]:             phbErrorLog0 = 2148000098000240
[   99.150774762,3] PHB#0000[0:0]:             phbErrorLog1 = a008400000000000
[   99.150823696,3] PHB#0000[0:0]:        phbTxeErrorStatus = 0000000000000000
[   99.150872357,3] PHB#0000[0:0]:   phbTxeFirstErrorStatus = 0000000000000000
[   99.150916641,3] PHB#0000[0:0]:          phbTxeErrorLog0 = 0000000000000000
[   99.150965287,3] PHB#0000[0:0]:          phbTxeErrorLog1 = 0000000000000000
[   99.151018775,3] PHB#0000[0:0]:     phbRxeArbErrorStatus = 4000200000000000
[   99.151074489,3] PHB#0000[0:0]: phbRxeArbFrstErrorStatus = 0000200000000000
[   99.151127737,3] PHB#0000[0:0]:       phbRxeArbErrorLog0 = 02409fde30000000
[   99.151171863,3] PHB#0000[0:0]:       phbRxeArbErrorLog1 = 0000000000000000
[   99.151215896,3] PHB#0000[0:0]:     phbRxeMrgErrorStatus = 0000000000000000
[   99.151260084,3] PHB#0000[0:0]: phbRxeMrgFrstErrorStatus = 0000000000000000
[   99.151315450,3] PHB#0000[0:0]:       phbRxeMrgErrorLog0 = 0000000000000000
[   99.151369016,3] PHB#0000[0:0]:       phbRxeMrgErrorLog1 = 0000000000000000
[   99.151424438,3] PHB#0000[0:0]:     phbRxeTceErrorStatus = 0000000000000000
[   99.151471170,3] PHB#0000[0:0]: phbRxeTceFrstErrorStatus = 0000000000000000
[   99.151517918,3] PHB#0000[0:0]:       phbRxeTceErrorLog0 = 0000000000000000
[   99.151561833,3] PHB#0000[0:0]:       phbRxeTceErrorLog1 = 0000000000000000
[   99.151614682,3] PHB#0000[0:0]:        phbPblErrorStatus = 0000000001000000
[   99.151663274,3] PHB#0000[0:0]:   phbPblFirstErrorStatus = 0000000001000000
[   99.151716727,3] PHB#0000[0:0]:          phbPblErrorLog0 = 0000000000000000
[   99.151762796,3] PHB#0000[0:0]:          phbPblErrorLog1 = 0000000000000000
[   99.151813691,3] PHB#0000[0:0]:      phbPcieDlpErrorLog1 = 0000000000000000
[   99.151858094,3] PHB#0000[0:0]:      phbPcieDlpErrorLog2 = 0000000000000000
[   99.151904253,3] PHB#0000[0:0]:    phbPcieDlpErrorStatus = 00be000000000000
[   99.151959774,3] PHB#0000[0:0]:       phbRegbErrorStatus = 0000004000000000
[   99.152015372,3] PHB#0000[0:0]:  phbRegbFirstErrorStatus = 0000004000000000
[   99.152068905,3] PHB#0000[0:0]:         phbRegbErrorLog0 = 8800006c00000000
[   99.152115691,3] PHB#0000[0:0]:         phbRegbErrorLog1 = 0000000007011000
[   99.152162310,3] PHB#0000[0:0]:                PEST[000] = a440002a00000000 8000000000000000
[   99.152218234,3] PHB#0000[0:0]:                PEST[001] = 8000000000000000 8000000000000000
[   99.152285858,3] PHB#0000[0:0]:                PEST[002] = 8000000000000000 8000000000000000
[   99.152350714,3] PHB#0000[0:0]:                PEST[003] = 8000000000000000 8000000000000000
[   99.152414534,3] PHB#0000[0:0]:                PEST[004] = 8000000000000000 8000000000000000
[   99.152474834,3] PHB#0000[0:0]:                PEST[005] = 8000000000000000 8000000000000000
[   99.152528675,3] PHB#0000[0:0]:                PEST[006] = 8000000000000000 8000000000000000
[   99.152589889,3] PHB#0000[0:0]:                PEST[007] = 8000000000000000 8000000000000000
[   99.152657446,3] PHB#0000[0:0]:                PEST[008] = 8000000000000000 8000000000000000
[   99.152720282,3] PHB#0000[0:0]:                PEST[1ff] = 3740002a03000000 0000000000000000
[    3.560406] EEH: Recovering PHB#0-PE#0
[    3.560433] EEH: PE location: UOPWR.D100029-Node0-SLOT1 PCIE 4.0 X16, PHB location: N/A
[    3.560473] EEH: Frozen PHB#0-PE#0 detected
[    3.560486] EEH: Call Trace:
[    3.560526] EEH: [00000000c094f14c] __eeh_send_failure_event+0x7c/0x160
[    3.560585] EEH: [00000000c2fbde4c] eeh_dev_check_failure+0x2c4/0x6a0
[    3.560634] EEH: [00000000eb293b00] amdgpu_device_rreg.part.0+0x160/0x1f0 [amdgpu]
[    3.560924] EEH: [0000000009854edf] psp_wait_for+0xac/0x130 [amdgpu]
[    3.561223] EEH: [0000000006086f20] psp_v11_0_mode1_reset+0xbc/0x130 [amdgpu]
[    3.561554] EEH: [00000000927ca5cd] psp_gpu_reset+0x88/0xd0 [amdgpu]
[    3.561868] EEH: [000000000d948d66] amdgpu_device_mode1_reset+0x148/0x180 [amdgpu]
[    3.562116] EEH: [00000000d607b75f] nv_asic_reset+0xbc/0x290 [amdgpu]
[    3.562414] EEH: [00000000893f34f2] amdgpu_device_init+0x172c/0x2300 [amdgpu]
[    3.562693] EEH: [00000000d8547fbc] amdgpu_driver_load_kms+0x30/0x1e0 [amdgpu]
[    3.562966] EEH: [000000008c9f0b1b] amdgpu_pci_probe+0x1f0/0x540 [amdgpu]
[    3.563210] EEH: [0000000067c06d95] local_pci_probe+0x68/0x110
[    3.563250] EEH: [000000004224f0ca] work_for_cpu_fn+0x38/0x60
[    3.563290] EEH: [00000000c5105116] process_one_work+0x2a4/0x570
[    3.563332] EEH: [00000000f81a86b6] worker_thread+0x280/0x5b0
[    3.563372] EEH: [00000000bf39fc31] kthread+0x120/0x130
[    3.563409] EEH: [0000000036d034ff] ret_from_kernel_thread+0x5c/0x64
[    3.852813] kernel BUG at drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c:593!
[    3.852840] Oops: Exception in kernel mode, sig: 5 [#1]
[    3.852856] LE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[    3.852884] Modules linked in: uas usb_storage sd_mod amdgpu(+) gpu_sched drm_buddy i2c_algo_bit drm_display_helper cec rc_core drm_ttm_helper ttm drm_kms_helper xhci_pci xhci_pci_renesas syscopyarea sysfillrect ahci sysimgblt fb_sys_fops libahci xhci_hcd libata drm vmx_crypto gf128mul usbcore scsi_mod drm_panel_orientation_quirks usb_common scsi_common agpgart dm_mirror dm_region_hash dm_log dm_mod btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic crc32c_vpmsum
[    3.853130] CPU: 0 PID: 23 Comm: kworker/0:0 Not tainted 6.0.13_1 #1
[    3.853162] Workqueue: events work_for_cpu_fn
[    3.853201] NIP:  c008000002cbb648 LR: c008000002c3cb50 CTR: c008000002cbb5f8
[    3.853241] REGS: c000000002527500 TRAP: 0700   Not tainted  (6.0.13_1)
[    3.853288] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002248  XER: 20040000
[    3.853339] CFAR: c008000002cbb6dc IRQMASK: 0
[    3.853339] GPR00: c008000002c3cb50 c0000000025277a0 c0080000033b1000 00feffffff900000
[    3.853339] GPR04: 00feffffff900000 c000000002527858 c000000002527860 c0000000024f0000
[    3.853339] GPR08: 0000000000000001 00fe000000000000 0040000000000002 c00800000318aef0
[    3.853339] GPR12: c008000002cbb5f8 c0000003ff7ef600 c000000016e86070 c000000016e86078
[    3.853339] GPR16: c000000016e86068 c000000016e98338 c000000016e86088 c000000016e86090
[    3.853339] GPR20: c000000016e86080 c008000003430dcc 0000000000000100 c000000016e97250
[    3.853339] GPR24: 0000000000000001 c0080000033c5dd0 c000000016e80000 c000000016e85208
[    3.853339] GPR28: c000000016e80000 ffffffffffffffff c000000002527860 c000000002527860
[    3.853711] NIP [c008000002cbb648] gmc_v10_0_get_vm_pde+0x50/0x120 [amdgpu]
[    3.854018] LR [c008000002c3cb50] amdgpu_gmc_get_pde_for_bo+0xa8/0x110 [amdgpu]
[    3.854326] Call Trace:
[    3.854348] [c0000000025277a0] [c0000000025277e0] 0xc0000000025277e0 (unreliable)
[    3.854389] [c0000000025277e0] [c008000002c3cb50] amdgpu_gmc_get_pde_for_bo+0xa8/0x110 [amdgpu]
[    3.854699] [c000000002527830] [c008000002c3cc08] amdgpu_gmc_pd_addr+0x50/0xa8 [amdgpu]
[    3.855008] [c000000002527870] [c008000002cb7b30] gfxhub_v2_0_gart_enable+0x48/0x11f0 [amdgpu]
[    3.855325] [c0000000025278d0] [c008000002cbce30] gmc_v10_0_hw_init+0x88/0x270 [amdgpu]
[    3.855651] [c000000002527960] [c008000002be4a9c] amdgpu_device_init+0x1ee4/0x2300 [amdgpu]
[    3.855968] [c000000002527ac0] [c008000002be6758] amdgpu_driver_load_kms+0x30/0x1e0 [amdgpu]
[    3.856240] [c000000002527b40] [c008000002bdae68] amdgpu_pci_probe+0x1f0/0x540 [amdgpu]
[    3.856532] [c000000002527be0] [c0000000008d6078] local_pci_probe+0x68/0x110
[    3.856583] [c000000002527c60] [c00000000017f5b8] work_for_cpu_fn+0x38/0x60
[    3.856634] [c000000002527c90] [c000000000184ee4] process_one_work+0x2a4/0x570
[    3.856684] [c000000002527d30] [c000000000185a30] worker_thread+0x280/0x5b0
[    3.856725] [c000000002527dc0] [c000000000191a70] kthread+0x120/0x130
[    3.856765] [c000000002527e10] [c00000000000cecc] ret_from_kernel_thread+0x5c/0x64
[    3.856807] Instruction dump:
[    3.856829] 7c7c1b78 fbe1fff8 7c9d2378 f821ffc1 e8850000 794a07c6 7cdf3378 614a0002
[    3.856876] 7d095039 41820074 788982a0 79298002 <0b090000> 893c0d44 2c090000 41820014
[    3.856935] ---[ end trace 0000000000000000 ]---

fast reboot is disabled, and these were the firmware files I used-
Code: [Select]
navi10_asd.bin     navi14_gpu_info.bin  navi14_me_wks.bin   navi14_smc.bin
navi10_ta.bin      navi14_me.bin        navi14_pfp.bin      navi14_sos.bin
navi10_vcn.bin     navi14_mec2.bin      navi14_pfp_wks.bin  navi14_ta.bin
navi14_asd.bin     navi14_mec2_wks.bin  navi14_rlc.bin      navi14_vcn.bin
navi14_ce.bin      navi14_mec.bin       navi14_sdma1.bin
navi14_ce_wks.bin  navi14_mec_wks.bin   navi14_sdma.bin

I tried all manner of combinations of the navi firmware and the ones that did give me video in petitboot would throw the same error.
72
Firmware / Re: Firmware 2.10 for Talos-II and Blackbird available
« Last post by ClassicHasClass on February 29, 2024, 02:36:03 pm »
Excellent!
73
Firmware / Re: Firmware 2.10 for Talos-II and Blackbird available
« Last post by r34per on February 28, 2024, 03:45:41 pm »
I ended up building it successfully. I was using -j16 as a parameter, when I just ran ./op-build without it the firmware was able to eventually build  :D Now if only I hadn't forgot to update the WOF tables...

Update: Rebuilt the firmware and flashed it to my blackbird. I flashed the navi10 and navi14 firmware files to BOOTKERNFW and I get an output from my rx5300 in petitboot!!
74
Firmware / Re: Firmware 2.10 for Talos-II and Blackbird available
« Last post by atomicdog on February 28, 2024, 12:29:12 pm »
Are you trying to build a version with debug info?
Looks like the debug Dwarf version isn't being set properly for some reason.
75
Firmware / Re: Firmware 2.10 for Talos-II and Blackbird available
« Last post by cy384 on February 28, 2024, 11:03:17 am »
When I built the firmware (prior to this release) I used an ubuntu 18.04 VM.
76
Firmware / Re: Firmware 2.10 for Talos-II and Blackbird available
« Last post by r34per on February 28, 2024, 07:09:43 am »
I've been trying to compile the firmware to test it out with my rx5300, but I'm having some trouble building the firmware for my blackbird. Has anyone had success in building the firmware? I tried on debian12 ppc64le on my power8 server, and the build fails with a bunch of errors like this-

Code: [Select]
error: found dwarf version '6657', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '261', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '6657', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '3077', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '6657', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '5125', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '6657', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '7173', this reader only handles version 2, 3, 4 and 5 information
...
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '769', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '769', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '59', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '59', this reader only handles version 2, 3, 4 and 5 information
make[2]: Leaving directory '/root/op-build/output/build/occ-9ddc6ba57476e6483244425db270259735541967/src/occ_405'
make[1]: Leaving directory '/root/op-build/output/build/occ-9ddc6ba57476e6483244425db270259735541967/src'
make: *** [package/pkg-generic.mk:292: /root/op-build/output/build/occ-9ddc6ba57476e6483244425db270259735541967/.stamp_built] Error 2
make: Leaving directory '/root/op-build/buildroot'
root@buildbox-deb11:~/op-build# dwarf
-bash: dwarf: command not found

I get the same error on ubuntu 22.04 on my x86_64 server
77
GPU Compute / Accelerators / Re: Radeon RDNA3 support?
« Last post by MauryG5 on February 27, 2024, 01:01:02 pm »
We need to understand what exactly is needed to make it work properly. In recent times they have made several AMD driver updates, to understand if very recent firmware is needed, which versions of Mesa drivers are necessary. I would like to mount a Radeon Pro 7700 rdna 3 for example...
78
Firmware / Re: Firmware 2.10 for Talos-II and Blackbird available
« Last post by MauryG5 on February 25, 2024, 12:21:28 pm »
So this means that in theory it would now be possible to use the graphics card like the Navi10, from the beginning and therefore also to load the petit boot... If I understand correctly...
79
User Zone / Re: Blackbird User Guide: Wrong TPM connector and 'missing' USB 2.0 port
« Last post by draconx on February 23, 2024, 05:22:55 pm »
Slightly off topic, but one idea I thought of was to make a little USB 2.0 hub expansion board connected to a PCIE bracket for the back of the computer, connected to the lone USB 2.0 port with a USB cable. It wouldn't do anything for USB bandwidth, but would give more USB ports for peripherals and such on the back. There are only 2 ports there now, so having more would bring it more in line with x86 boards.

You can very cheaply buy ready-made one-to-four-port USB2 hubs with normal motherboard headers, then you can just use any off the shelf USB bracket or case with USB ports.  I used this one and cut apart a cable with an angled USB-A connector to connect it to the Blackbird motherboard (I wanted to share photos but apparently uploads are busted on this forum).
80
Talos II / Re: OpenBMC password
« Last post by draconx on February 23, 2024, 04:15:13 pm »
I tried the instructions at graphcore.ai with no luck. But the instructions on the wiki for Resetting the BMC's Persistant Storage worked for me on two systems. After wiping the persistent storage, I was able to log into the BMC with the default password and get all the firmwares updated.

Wiping the entire persistent storage seems like total overkill if the only problem is that you forgot the BMC root password.

All you need to do is edit the shadow file on the overlay filesystem with a new password.

Follow that wiki page instructions to add overlay-filesystem-in-ram to the boot command line via u-boot.  This enables a root console login with the default root password of 0penBmc.  Once logged in, mount the writable overlay partition somewhere, for example:

Code: [Select]
# mount -t jffs2 /dev/mtdblock5 /mnt
Then the simplest way is probably to just copy /mnt/cow/etc/shadow over /etc/shadow, run passwd, then copy the newly-updated /etc/shadow back to the writable partition, for example:

Code: [Select]
# cp /mnt/cow/etc/shadow /etc/shadow
# passwd
New password:
Retype new password:
Retype new password:
passwd: Password updated successfully
# cp /etc/shadow /mnt/cow/etc/shadow

Or just run an editor on /mnt/cow/etc/shadow and manually change the root password hash to whatever you want.  Then reboot, and voila, shiny new BMC root password.
Pages: 1 ... 6 7 [8] 9 10