Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - r34per

Pages: [1] 2
1
Firmware / Re: Firmware 2.10 for Talos-II and Blackbird available
« on: February 29, 2024, 04:40:17 pm »
I may have spoke too soon :( I get no video when I boot into the os, and the boot console spits this out-
Code: [Select]
SIGTERM received, booting...
[   99.149386402,3] PHB#0000[0:0]:                  brdgCtl = 00000002
[   99.149481878,3] PHB#0000[0:0]:             deviceStatus = 00000020
[   99.149523997,3] PHB#0000[0:0]:               slotStatus = 00402000
[   99.149618190,3] PHB#0000[0:0]:               linkStatus = a0840008
[   99.149660035,3] PHB#0000[0:0]:             devCmdStatus = 00100107
[   99.149727651,3] PHB#0000[0:0]:             devSecStatus = 00002000
[   99.149774208,3] PHB#0000[0:0]:          rootErrorStatus = 00000000
[   99.149829970,3] PHB#0000[0:0]:          corrErrorStatus = 00000000
[   99.149869722,3] PHB#0000[0:0]:        uncorrErrorStatus = 00000000
[   99.149918442,3] PHB#0000[0:0]:                   devctl = 00000020
[   99.149955380,3] PHB#0000[0:0]:                  devStat = 00000000
[   99.149996897,3] PHB#0000[0:0]:                  tlpHdr1 = 00000000
[   99.150043352,3] PHB#0000[0:0]:                  tlpHdr2 = 00000000
[   99.150096694,3] PHB#0000[0:0]:                  tlpHdr3 = 00000000
[   99.150143000,3] PHB#0000[0:0]:                  tlpHdr4 = 00000000
[   99.150189643,3] PHB#0000[0:0]:                 sourceId = 00000000
[   99.150231444,3] PHB#0000[0:0]:                     nFir = 0000000000000000
[   99.150275820,3] PHB#0000[0:0]:                 nFirMask = 0030001c00000000
[   99.150319837,3] PHB#0000[0:0]:                  nFirWOF = 0000000000000000
[   99.150378022,3] PHB#0000[0:0]:                 phbPlssr = 0000001c00000000
[   99.150433559,3] PHB#0000[0:0]:                   phbCsr = 0000001c00000000
[   99.150489148,3] PHB#0000[0:0]:                   lemFir = 0000000100280000
[   99.150533384,3] PHB#0000[0:0]:             lemErrorMask = 0000000000000000
[   99.150577353,3] PHB#0000[0:0]:                   lemWOF = 0000000100000000
[   99.150621318,3] PHB#0000[0:0]:           phbErrorStatus = 0000088000000000
[   99.150672497,3] PHB#0000[0:0]:      phbFirstErrorStatus = 0000008000000000
[   99.150728026,3] PHB#0000[0:0]:             phbErrorLog0 = 2148000098000240
[   99.150774762,3] PHB#0000[0:0]:             phbErrorLog1 = a008400000000000
[   99.150823696,3] PHB#0000[0:0]:        phbTxeErrorStatus = 0000000000000000
[   99.150872357,3] PHB#0000[0:0]:   phbTxeFirstErrorStatus = 0000000000000000
[   99.150916641,3] PHB#0000[0:0]:          phbTxeErrorLog0 = 0000000000000000
[   99.150965287,3] PHB#0000[0:0]:          phbTxeErrorLog1 = 0000000000000000
[   99.151018775,3] PHB#0000[0:0]:     phbRxeArbErrorStatus = 4000200000000000
[   99.151074489,3] PHB#0000[0:0]: phbRxeArbFrstErrorStatus = 0000200000000000
[   99.151127737,3] PHB#0000[0:0]:       phbRxeArbErrorLog0 = 02409fde30000000
[   99.151171863,3] PHB#0000[0:0]:       phbRxeArbErrorLog1 = 0000000000000000
[   99.151215896,3] PHB#0000[0:0]:     phbRxeMrgErrorStatus = 0000000000000000
[   99.151260084,3] PHB#0000[0:0]: phbRxeMrgFrstErrorStatus = 0000000000000000
[   99.151315450,3] PHB#0000[0:0]:       phbRxeMrgErrorLog0 = 0000000000000000
[   99.151369016,3] PHB#0000[0:0]:       phbRxeMrgErrorLog1 = 0000000000000000
[   99.151424438,3] PHB#0000[0:0]:     phbRxeTceErrorStatus = 0000000000000000
[   99.151471170,3] PHB#0000[0:0]: phbRxeTceFrstErrorStatus = 0000000000000000
[   99.151517918,3] PHB#0000[0:0]:       phbRxeTceErrorLog0 = 0000000000000000
[   99.151561833,3] PHB#0000[0:0]:       phbRxeTceErrorLog1 = 0000000000000000
[   99.151614682,3] PHB#0000[0:0]:        phbPblErrorStatus = 0000000001000000
[   99.151663274,3] PHB#0000[0:0]:   phbPblFirstErrorStatus = 0000000001000000
[   99.151716727,3] PHB#0000[0:0]:          phbPblErrorLog0 = 0000000000000000
[   99.151762796,3] PHB#0000[0:0]:          phbPblErrorLog1 = 0000000000000000
[   99.151813691,3] PHB#0000[0:0]:      phbPcieDlpErrorLog1 = 0000000000000000
[   99.151858094,3] PHB#0000[0:0]:      phbPcieDlpErrorLog2 = 0000000000000000
[   99.151904253,3] PHB#0000[0:0]:    phbPcieDlpErrorStatus = 00be000000000000
[   99.151959774,3] PHB#0000[0:0]:       phbRegbErrorStatus = 0000004000000000
[   99.152015372,3] PHB#0000[0:0]:  phbRegbFirstErrorStatus = 0000004000000000
[   99.152068905,3] PHB#0000[0:0]:         phbRegbErrorLog0 = 8800006c00000000
[   99.152115691,3] PHB#0000[0:0]:         phbRegbErrorLog1 = 0000000007011000
[   99.152162310,3] PHB#0000[0:0]:                PEST[000] = a440002a00000000 8000000000000000
[   99.152218234,3] PHB#0000[0:0]:                PEST[001] = 8000000000000000 8000000000000000
[   99.152285858,3] PHB#0000[0:0]:                PEST[002] = 8000000000000000 8000000000000000
[   99.152350714,3] PHB#0000[0:0]:                PEST[003] = 8000000000000000 8000000000000000
[   99.152414534,3] PHB#0000[0:0]:                PEST[004] = 8000000000000000 8000000000000000
[   99.152474834,3] PHB#0000[0:0]:                PEST[005] = 8000000000000000 8000000000000000
[   99.152528675,3] PHB#0000[0:0]:                PEST[006] = 8000000000000000 8000000000000000
[   99.152589889,3] PHB#0000[0:0]:                PEST[007] = 8000000000000000 8000000000000000
[   99.152657446,3] PHB#0000[0:0]:                PEST[008] = 8000000000000000 8000000000000000
[   99.152720282,3] PHB#0000[0:0]:                PEST[1ff] = 3740002a03000000 0000000000000000
[    3.560406] EEH: Recovering PHB#0-PE#0
[    3.560433] EEH: PE location: UOPWR.D100029-Node0-SLOT1 PCIE 4.0 X16, PHB location: N/A
[    3.560473] EEH: Frozen PHB#0-PE#0 detected
[    3.560486] EEH: Call Trace:
[    3.560526] EEH: [00000000c094f14c] __eeh_send_failure_event+0x7c/0x160
[    3.560585] EEH: [00000000c2fbde4c] eeh_dev_check_failure+0x2c4/0x6a0
[    3.560634] EEH: [00000000eb293b00] amdgpu_device_rreg.part.0+0x160/0x1f0 [amdgpu]
[    3.560924] EEH: [0000000009854edf] psp_wait_for+0xac/0x130 [amdgpu]
[    3.561223] EEH: [0000000006086f20] psp_v11_0_mode1_reset+0xbc/0x130 [amdgpu]
[    3.561554] EEH: [00000000927ca5cd] psp_gpu_reset+0x88/0xd0 [amdgpu]
[    3.561868] EEH: [000000000d948d66] amdgpu_device_mode1_reset+0x148/0x180 [amdgpu]
[    3.562116] EEH: [00000000d607b75f] nv_asic_reset+0xbc/0x290 [amdgpu]
[    3.562414] EEH: [00000000893f34f2] amdgpu_device_init+0x172c/0x2300 [amdgpu]
[    3.562693] EEH: [00000000d8547fbc] amdgpu_driver_load_kms+0x30/0x1e0 [amdgpu]
[    3.562966] EEH: [000000008c9f0b1b] amdgpu_pci_probe+0x1f0/0x540 [amdgpu]
[    3.563210] EEH: [0000000067c06d95] local_pci_probe+0x68/0x110
[    3.563250] EEH: [000000004224f0ca] work_for_cpu_fn+0x38/0x60
[    3.563290] EEH: [00000000c5105116] process_one_work+0x2a4/0x570
[    3.563332] EEH: [00000000f81a86b6] worker_thread+0x280/0x5b0
[    3.563372] EEH: [00000000bf39fc31] kthread+0x120/0x130
[    3.563409] EEH: [0000000036d034ff] ret_from_kernel_thread+0x5c/0x64
[    3.852813] kernel BUG at drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c:593!
[    3.852840] Oops: Exception in kernel mode, sig: 5 [#1]
[    3.852856] LE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[    3.852884] Modules linked in: uas usb_storage sd_mod amdgpu(+) gpu_sched drm_buddy i2c_algo_bit drm_display_helper cec rc_core drm_ttm_helper ttm drm_kms_helper xhci_pci xhci_pci_renesas syscopyarea sysfillrect ahci sysimgblt fb_sys_fops libahci xhci_hcd libata drm vmx_crypto gf128mul usbcore scsi_mod drm_panel_orientation_quirks usb_common scsi_common agpgart dm_mirror dm_region_hash dm_log dm_mod btrfs blake2b_generic xor raid6_pq libcrc32c crc32c_generic crc32c_vpmsum
[    3.853130] CPU: 0 PID: 23 Comm: kworker/0:0 Not tainted 6.0.13_1 #1
[    3.853162] Workqueue: events work_for_cpu_fn
[    3.853201] NIP:  c008000002cbb648 LR: c008000002c3cb50 CTR: c008000002cbb5f8
[    3.853241] REGS: c000000002527500 TRAP: 0700   Not tainted  (6.0.13_1)
[    3.853288] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002248  XER: 20040000
[    3.853339] CFAR: c008000002cbb6dc IRQMASK: 0
[    3.853339] GPR00: c008000002c3cb50 c0000000025277a0 c0080000033b1000 00feffffff900000
[    3.853339] GPR04: 00feffffff900000 c000000002527858 c000000002527860 c0000000024f0000
[    3.853339] GPR08: 0000000000000001 00fe000000000000 0040000000000002 c00800000318aef0
[    3.853339] GPR12: c008000002cbb5f8 c0000003ff7ef600 c000000016e86070 c000000016e86078
[    3.853339] GPR16: c000000016e86068 c000000016e98338 c000000016e86088 c000000016e86090
[    3.853339] GPR20: c000000016e86080 c008000003430dcc 0000000000000100 c000000016e97250
[    3.853339] GPR24: 0000000000000001 c0080000033c5dd0 c000000016e80000 c000000016e85208
[    3.853339] GPR28: c000000016e80000 ffffffffffffffff c000000002527860 c000000002527860
[    3.853711] NIP [c008000002cbb648] gmc_v10_0_get_vm_pde+0x50/0x120 [amdgpu]
[    3.854018] LR [c008000002c3cb50] amdgpu_gmc_get_pde_for_bo+0xa8/0x110 [amdgpu]
[    3.854326] Call Trace:
[    3.854348] [c0000000025277a0] [c0000000025277e0] 0xc0000000025277e0 (unreliable)
[    3.854389] [c0000000025277e0] [c008000002c3cb50] amdgpu_gmc_get_pde_for_bo+0xa8/0x110 [amdgpu]
[    3.854699] [c000000002527830] [c008000002c3cc08] amdgpu_gmc_pd_addr+0x50/0xa8 [amdgpu]
[    3.855008] [c000000002527870] [c008000002cb7b30] gfxhub_v2_0_gart_enable+0x48/0x11f0 [amdgpu]
[    3.855325] [c0000000025278d0] [c008000002cbce30] gmc_v10_0_hw_init+0x88/0x270 [amdgpu]
[    3.855651] [c000000002527960] [c008000002be4a9c] amdgpu_device_init+0x1ee4/0x2300 [amdgpu]
[    3.855968] [c000000002527ac0] [c008000002be6758] amdgpu_driver_load_kms+0x30/0x1e0 [amdgpu]
[    3.856240] [c000000002527b40] [c008000002bdae68] amdgpu_pci_probe+0x1f0/0x540 [amdgpu]
[    3.856532] [c000000002527be0] [c0000000008d6078] local_pci_probe+0x68/0x110
[    3.856583] [c000000002527c60] [c00000000017f5b8] work_for_cpu_fn+0x38/0x60
[    3.856634] [c000000002527c90] [c000000000184ee4] process_one_work+0x2a4/0x570
[    3.856684] [c000000002527d30] [c000000000185a30] worker_thread+0x280/0x5b0
[    3.856725] [c000000002527dc0] [c000000000191a70] kthread+0x120/0x130
[    3.856765] [c000000002527e10] [c00000000000cecc] ret_from_kernel_thread+0x5c/0x64
[    3.856807] Instruction dump:
[    3.856829] 7c7c1b78 fbe1fff8 7c9d2378 f821ffc1 e8850000 794a07c6 7cdf3378 614a0002
[    3.856876] 7d095039 41820074 788982a0 79298002 <0b090000> 893c0d44 2c090000 41820014
[    3.856935] ---[ end trace 0000000000000000 ]---

fast reboot is disabled, and these were the firmware files I used-
Code: [Select]
navi10_asd.bin     navi14_gpu_info.bin  navi14_me_wks.bin   navi14_smc.bin
navi10_ta.bin      navi14_me.bin        navi14_pfp.bin      navi14_sos.bin
navi10_vcn.bin     navi14_mec2.bin      navi14_pfp_wks.bin  navi14_ta.bin
navi14_asd.bin     navi14_mec2_wks.bin  navi14_rlc.bin      navi14_vcn.bin
navi14_ce.bin      navi14_mec.bin       navi14_sdma1.bin
navi14_ce_wks.bin  navi14_mec_wks.bin   navi14_sdma.bin

I tried all manner of combinations of the navi firmware and the ones that did give me video in petitboot would throw the same error.

2
Firmware / Re: Firmware 2.10 for Talos-II and Blackbird available
« on: February 28, 2024, 03:45:41 pm »
I ended up building it successfully. I was using -j16 as a parameter, when I just ran ./op-build without it the firmware was able to eventually build  :D Now if only I hadn't forgot to update the WOF tables...

Update: Rebuilt the firmware and flashed it to my blackbird. I flashed the navi10 and navi14 firmware files to BOOTKERNFW and I get an output from my rx5300 in petitboot!!

3
Firmware / Re: Firmware 2.10 for Talos-II and Blackbird available
« on: February 28, 2024, 07:09:43 am »
I've been trying to compile the firmware to test it out with my rx5300, but I'm having some trouble building the firmware for my blackbird. Has anyone had success in building the firmware? I tried on debian12 ppc64le on my power8 server, and the build fails with a bunch of errors like this-

Code: [Select]
error: found dwarf version '6657', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '261', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '6657', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '3077', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '6657', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '5125', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '6657', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '7173', this reader only handles version 2, 3, 4 and 5 information
...
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '769', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '769', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '59', this reader only handles version 2, 3, 4 and 5 information
/root/op-build/output/host/bin/powerpc64le-buildroot-linux-gnu-objdump: DWARF error: found dwarf version '59', this reader only handles version 2, 3, 4 and 5 information
make[2]: Leaving directory '/root/op-build/output/build/occ-9ddc6ba57476e6483244425db270259735541967/src/occ_405'
make[1]: Leaving directory '/root/op-build/output/build/occ-9ddc6ba57476e6483244425db270259735541967/src'
make: *** [package/pkg-generic.mk:292: /root/op-build/output/build/occ-9ddc6ba57476e6483244425db270259735541967/.stamp_built] Error 2
make: Leaving directory '/root/op-build/buildroot'
root@buildbox-deb11:~/op-build# dwarf
-bash: dwarf: command not found

I get the same error on ubuntu 22.04 on my x86_64 server

4
Firmware / Re: Firmware 2.10 for Talos-II and Blackbird available
« on: February 23, 2024, 08:19:21 am »
Is there enough space in bootkernfw for navi10 firmware? Last time I tried on the current FW there wasn't enough room for all of the files.

5
GPU Compute / Accelerators / Re: Intel Arc Support in Kernel 6.8
« on: February 09, 2024, 11:26:58 am »
I'm cautiously optimistic, I've been literally looking for an excuse to pick up an arc gpu as it'd be perfect for my blackbird. One question, why is the Huc microcontroller not likely to be supported and what limitations would there be on video encoders without it?

6
General OpenPOWER Discussion / Re: POWER11 on the horizon?
« on: October 20, 2023, 11:50:37 am »
It's been about 3-4 years between the next gen POWER cpu release going back to POWER3, so 2025 or late 2024 sounds about right for POWER11.

7
Applications and Porting / Building Jellyfin for ppc64le?
« on: October 18, 2023, 01:06:45 pm »
Has anyone had success in building jellyfin for ppc64le? It seems to use .net7, which is available for POWER. It seems to need to have a specific platform per the doc here- https://jellyfin.org/docs/general/installation/source/

I tried to mess with it on almalinux 8 but didn't have much luck. I did try renaming the build.centos.amd64 to build.centos.ppc64el and replacing any references to ppc64el, but it of course wasn't that easy and still just built for x86_64 from what I can tell. That's about the extent of my skillset on that though, I'm not sure what else would need to be done to get it buildable

8
General CPU Discussion / Re: Safe operating temps of a 10 core power8 cpu?
« on: September 21, 2023, 10:15:31 am »
gotcha, thanks! Yea, I kinda got the impression that ibm never intended these to be run out of a home 5ft from where you work.

9
General CPU Discussion / Safe operating temps of a 10 core power8 cpu?
« on: September 19, 2023, 03:18:10 pm »
I recently purchased a power S812LC with dual 10 core power8 cpu's for my homelab, and good lord the fans are loud on that thing even just idling. I was able to lower the fan speed to tolerable levels with ipmitool, but I wanted to make sure the cpu temps were still within safe levels. With the fans set at about 8500 rpm or so the cpu's are mid-high 60's under load and hover around 60c at idle. I wasn't able to find any definitive answer as to what the safe operating range for the power8 is; chatgpt says it's 85-90c, but it has a habit of being confidently incorrect on things. Should those temps be fine, or do I need to bump the fans up a bit?

10
Blackbird / Re: Blackbird Cooling
« on: August 09, 2023, 06:32:50 am »
Gotcha, thanks! Also your post about recompiling the firmware with additional WOF tables was a huge help for me! I bought the same 16 core cpu as you and after following what you did the errors cleared out in openbmc. No way I would have figured that out myself ;D

11
Blackbird / Re: Blackbird Cooling
« on: August 04, 2023, 01:51:54 pm »
sorry to necro this thread, but where did you get those heatsinks from @cy384? I plan on trying a 16 core cpu in my blackbird and those heatsinks look like they'd work pretty well with my fan setup

12
I got this error in my Blackbird's BMC log when there was a brief (like one second) power outage (brownout maybe?) last week.

That could be what's happening to mine too when I think about. Brief brown-outs aren't uncommon when it's windy or stormy, which it has been when it happened. I should probably invest in a UPS for it and see if it still gives me any trouble. I'll try MPC7500's suggestions too, thanks for the help!

13
I'll give that a try, thanks for the heads up!

14
User Zone / Re: Calling for gaming experiences
« on: April 28, 2023, 05:55:46 pm »
virtualjaguar compiled fine and runs great on void linux on my blackbird. http://www.icculus.org/virtualjaguar/


as a bit of an aside, perhaps the games compatibility section on the wiki could be broken up some and organize things, maybe a page for emulators, and game genres? or sort games by some other way, just spitballing. The more we add the larger and more cumbersome it will get to navigate. I don't mind setting it up, I'm note sure what the process is for that though.

15
I checked the OpenBMC web interface to find it reported in being critical health with 200 high priority errors logged yesterday over the course of about 3 hours, and as far as I can tell it's the same error for all of them.

The error is
Code: [Select]
org.open_power.Proc.FSI.Error.MasterDetectionFailureWhen I expand the entry this is what it reads-
Code: [Select]
CALLOUT_DEVICE_PATH=/sys/devices/platform/gpio-fsi/fsi0/slave@00:00/raw CALLOUT_ERRNO=0 _PID=8606
Like I said they all appear to be the same error with that same message, although the PID= number is different. My blackbird appeared to have powered off at some point as well, though I don't know when(can I check that somewhere in openbmc?). I stepped away from my pc for the evening before the first was logged and forgot to shut it down for the night, and when I went to use it this morning it was not running.

I'm running void linux as the os, and I couldn't find any logs that would shed any light on it. It seems by default void does have a syslog daemon and I never bothered installing one, oops.

Is this a cause for concern, and should I put a ticket in with RCS about it? It happened once before a few weeks or so ago but I chalked it up to a fluke, I cleared the logs and it seemed to be fine.

Pages: [1] 2