Author Topic: [Need guidance] Troubleshooting Readeon Vega 64 graphic card no output on boot  (Read 7613 times)

tle

  • Sr. Member
  • ****
  • Posts: 462
  • Karma: +53/-0
    • View Profile
    • Trung's Personal Website
UPDATE: The issue is still around in the 2.0 version of the firmware.

Hi all

I am trying to make Gigabyte Radeon Graphics Card GV-RXVEGA64GAMING OC-8GD card working on boot. I follow the instructions in https://wiki.raptorcs.com/wiki/Add_GPU_Firmware_To_BOOTKERNFW to bundle up firmware then flash those firmwares to PNOR
Code: [Select]
$ sudo dnf install @development-tools
$ git clone https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
$ mkdir -p /tmp/firmwares/amdgpu
$ cp linux-firmware/amdgpu/vega10_* /tmp/firmware/amdgpu/
$ cd /tmp/firmware
$ mksquashfs * /tmp/firmware.bin -all-root -keep-as-directory
$ scp /tmp/firmware.bin root@bmc-ip-address:/tmp/firmware.bin
$ ssh root@bmc-ip-address
$ pflash -P BOOTKERNFW -e -p /tmp/firmware.bin


When the system boot up, no video output is detected on my screen, I switched back to the built-in HDMI of AST2500 and the system seems to hang with following output:


Code: [Select]
--== Welcome to Hostboot hostboot-3beba24/hbicore.bin ==--

  3.08320|secure|SecureROM valid - enabling functionality
  8.62232|Booting from SBE side 0 on master proc=00050000
  8.66832|ISTEP  6. 5 - host_init_fsi
  9.20956|ISTEP  6. 6 - host_set_ipl_parms
  9.80358|ISTEP  6. 7 - host_discover_targets
 10.45804|HWAS|PRESENT> DIMM[03]=8080000000000000
 10.45805|HWAS|PRESENT> Proc[05]=8000000000000000
 10.45806|HWAS|PRESENT> Core[07]=5055500000000000
 10.76196|ISTEP  6. 8 - host_update_master_tpm
 10.83124|SECURE|Security Access Bit> 0x0000000000000000
 10.83125|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
 10.86796|ISTEP  6. 9 - host_gard
 11.33699|HWAS|Deconfig HUID 0x00030000, Physical:/Sys0/Node0/DIMM0
 11.33711|HWAS|FUNCTIONAL> DIMM[03]=0080000000000000
 11.33712|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
 11.33714|HWAS|FUNCTIONAL> Core[07]=5055500000000000
 11.33914|ISTEP  6.11 - host_start_occ_xstop_handler
 12.86084|ISTEP  6.12 - host_voltage_config
 13.01363|ISTEP  7. 1 - mss_attr_cleanup
 14.48736|ISTEP  7. 2 - mss_volt
 14.84739|ISTEP  7. 3 - mss_freq
 15.53796|ISTEP  7. 4 - mss_eff_config
 16.81043|ISTEP  7. 5 - mss_attr_update
 16.84049|ISTEP  8. 1 - host_slave_sbe_config
 16.94450|ISTEP  8. 2 - host_setup_sbe
 16.94554|ISTEP  8. 3 - host_cbs_start
 16.94665|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
 16.95431|ISTEP  8. 5 - host_attnlisten_proc
 16.95527|ISTEP  8. 6 - host_p9_fbc_eff_config
 16.96162|ISTEP  8. 7 - host_p9_eff_config_links
 17.03091|ISTEP  8. 8 - proc_attr_update
 17.03201|ISTEP  8. 9 - proc_chiplet_fabric_scominit
 17.06735|ISTEP  8.10 - proc_xbus_scominit
 17.10310|ISTEP  8.11 - proc_xbus_enable_ridi
 17.12720|ISTEP  8.12 - host_set_voltages
 17.14691|ISTEP  9. 1 - fabric_erepair
 17.35688|ISTEP  9. 2 - fabric_io_dccal
 17.37554|ISTEP  9. 3 - fabric_pre_trainadv
 17.38003|ISTEP  9. 4 - fabric_io_run_training
 17.38356|ISTEP  9. 5 - fabric_post_trainadv
 17.38559|ISTEP  9. 6 - proc_smp_link_layer
 17.39132|ISTEP  9. 7 - proc_fab_iovalid
 17.84717|ISTEP  9. 8 - host_fbc_eff_config_aggregate
 17.86589|ISTEP 10. 1 - proc_build_smp
 18.32489|ISTEP 10. 2 - host_slave_sbe_update
 20.82944|ISTEP 10. 4 - proc_cen_ref_clk_enable
 20.96002|ISTEP 10. 5 - proc_enable_osclite
 20.96099|ISTEP 10. 6 - proc_chiplet_scominit
 21.12679|ISTEP 10. 7 - proc_abus_scominit
 21.14613|ISTEP 10. 8 - proc_obus_scominit
 21.14745|ISTEP 10. 9 - proc_npu_scominit
 21.16494|ISTEP 10.10 - proc_pcie_scominit
 21.22242|ISTEP 10.11 - proc_scomoverride_chiplets
 21.22526|ISTEP 10.12 - proc_chiplet_enable_ridi
 21.24276|ISTEP 10.13 - host_rng_bist
 21.27032|ISTEP 10.14 - host_update_redundant_tpm
 21.27243|ISTEP 11. 1 - host_prd_hwreconfig
 21.87940|ISTEP 11. 2 - cen_tp_chiplet_init1
 21.88167|ISTEP 11. 3 - cen_pll_initf
 21.88472|ISTEP 11. 4 - cen_pll_setup
 21.90654|ISTEP 11. 5 - cen_tp_chiplet_init2
 21.90887|ISTEP 11. 6 - cen_tp_arrayinit
 21.91180|ISTEP 11. 7 - cen_tp_chiplet_init3
 21.91599|ISTEP 11. 8 - cen_chiplet_init
 21.91904|ISTEP 11. 9 - cen_arrayinit
 21.92274|ISTEP 11.10 - cen_initf
 21.92493|ISTEP 11.11 - cen_do_manual_inits
 21.92722|ISTEP 11.12 - cen_startclocks
 21.93014|ISTEP 11.13 - cen_scominits
 21.93381|ISTEP 12. 1 - mss_getecid
 23.07910|ISTEP 12. 2 - dmi_attr_update
 23.11676|ISTEP 12. 3 - proc_dmi_scominit
 23.16648|ISTEP 12. 4 - cen_dmi_scominit
 23.17511|ISTEP 12. 5 - dmi_erepair
 23.19312|ISTEP 12. 6 - dmi_io_dccal
 23.19612|ISTEP 12. 7 - dmi_pre_trainadv
 23.19915|ISTEP 12. 8 - dmi_io_run_training
 23.21495|ISTEP 12. 9 - dmi_post_trainadv
 23.21729|ISTEP 12.10 - proc_cen_framelock
 23.22026|ISTEP 12.11 - host_startprd_dmi
 23.22243|ISTEP 12.12 - host_attnlisten_memb
 23.22604|ISTEP 12.13 - cen_set_inband_addr
 23.24306|ISTEP 13. 1 - host_disable_memvolt
 24.17510|ISTEP 13. 2 - mem_pll_reset
 24.24792|ISTEP 13. 3 - mem_pll_initf
 24.30519|ISTEP 13. 4 - mem_pll_setup
 24.33331|ISTEP 13. 6 - mem_startclocks
 24.34941|ISTEP 13. 7 - host_enable_memvolt
 24.35151|ISTEP 13. 8 - mss_scominit
 25.70075|ISTEP 13. 9 - mss_ddr_phy_reset
 26.14936|ISTEP 13.10 - mss_draminit
 26.61721|ISTEP 13.11 - mss_draminit_training
 27.51642|ISTEP 13.12 - mss_draminit_trainadv
 27.61042|ISTEP 13.13 - mss_draminit_mc
 27.78375|ISTEP 14. 1 - mss_memdiag
 31.58001|ISTEP 14. 2 - mss_thermal_init
 31.64733|ISTEP 14. 3 - proc_pcie_config
 31.71197|ISTEP 14. 4 - mss_power_cleanup
 31.71609|ISTEP 14. 5 - proc_setup_bars
 31.77851|ISTEP 14. 6 - proc_htm_setup
 31.79344|ISTEP 14. 7 - proc_exit_cache_contained
 31.85528|ISTEP 15. 1 - host_build_stop_image
 35.09151|ISTEP 15. 2 - proc_set_pba_homer_bar
 35.15084|ISTEP 15. 3 - host_establish_ex_chiplet
 35.16463|ISTEP 15. 4 - host_start_stop_engine
 35.39656|ISTEP 16. 1 - host_activate_master
 36.68418|ISTEP 16. 2 - host_activate_slave_cores
 36.77981|ISTEP 16. 3 - host_secure_rng
 36.80049|ISTEP 16. 4 - mss_scrub
 36.83218|ISTEP 16. 5 - host_load_io_ppe
 36.87857|ISTEP 16. 6 - host_ipl_complete
 37.75469|ISTEP 18.11 - proc_tod_setup
 38.02326|ISTEP 18.12 - proc_tod_init
 38.05033|ISTEP 20. 1 - host_load_payload
 39.03394|ISTEP 20. 2 - host_load_hdat
 40.11956|ISTEP 21. 1 - host_runtime_setup
 46.40398|htmgt|OCCs are now running in ACTIVE state
 52.37718|ISTEP 21. 2 - host_verify_hdat
 52.41944|ISTEP 21. 3 - host_start_payload
[   53.303670341,5] OPAL skiboot-c81f9d6 starting...
[   53.303673446,7] initial console log level: memory 7, driver 5
[   53.303675558,6] CPU: P9 generation processor (max 4 threads/core)
[   53.303677518,7] CPU: Boot CPU PIR is 0x0024 PVR is 0x004e1203
[   53.303680283,7] OPAL table: 0x30103930 .. 0x30103f10, branch table: 0x30002000
[   53.303683401,7] Assigning physical memory map table for nimbus
[   53.303686056,7] Parsing HDAT...
[   53.303687465,7] SPIRA-S found.
[   53.303689697,6] BMC #0: HW version 3, SW version 2, chip DD1.0
[   53.303770751,6] SP Family is ibm,ast2500,openbmc
[   53.303776925,7] LPC: IOPATH chip id = 0
[   53.303778302,7] LPC: FW BAR       = f0000000
[   53.303779833,7] LPC: MEM BAR      = e0000000
[   53.303781399,7] LPC: IO BAR       = d0010000
[   53.303782895,7] LPC: Internal BAR = c0012000
[   53.303795512,7] LPC UART: base addr = 3f8 (3f8) size = 1 clk = 1843200, baud = 115200
[   53.303798310,7] LPC: BT [0, 0] sms_int: 0, bmc_int: 0
[   53.305153464,5] HDAT I2C: found e3p0 - unknown@18 dp:ff (ff:)
[   53.305273899,5] HDAT I2C: found e3p1 - unknown@1c dp:ff (ff:)
[   53.306035877,5] CHIP: Chip ID 0000 type: P9N DD2.30
[   53.306474802,5] PLAT: Detected Blackbird platform
[   53.306544669,5] PLAT: Detected BMC platform ast2500:openbmc
[   53.323024841,5] CPU: All 32 processors called in...
[   53.501903768,7] LPC: Routing irq 10, policy: 0 (r=1)
[   53.501904733,7] LPC: SerIRQ 10 using route 0 targetted at OPAL
[   54.509575320,5] HIOMAP: Negotiated hiomap protocol v2
[   54.510806979,5] HIOMAP: Block size is 64KiB
[   54.510832611,5] HIOMAP: BMC suggested flash timeout of 8s
[   54.510876435,5] HIOMAP: Flash size is 64MiB
[   54.510970596,5] HIOMAP: Erase granule size is 64KiB
[   56.418150405,5] FLASH: Found system flash: (unnamed) id:0
[   57.207119491,7] LPC: Routing irq 4, policy: 0 (r=1)
[   57.207120295,7] LPC: SerIRQ 4 using route 1 targetted at OPAL
[   57.207242358,5] OCC: All Chip Rdy after 0 ms
[   58.009544728,3] STB: VERSION verification FAILED. log=0xffffffffffff8160
[   59.124974141,3] STB: IMA_CATALOG verification FAILED. log=0xffffffffffff8160
[   59.320998010,3] CAPP: Error loading ucode lid. index=203d1
[   59.335020541,5] PCI: Resetting PHBs and training links...
[   60.355590956,5] PCI: Probing slots...
[   60.412005477,5] PCI Summary:
[   60.412048918,5] PHB#0000:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..03 SLOT=CPU1 Slot2 (16x)
[   60.412163589,5] PHB#0000:01:00.0 [SWUP] 1022 1470 R:c1 C:060400 B:02..03 LOC_CODE=CPU1 Slot2 (16x)
[   60.412317704,5] PHB#0000:02:00.0 [SWDN] 1022 1471 R:00 C:060400 B:03..03
[   60.412419910,5] PHB#0000:03:00.0 [LGCY] 1002 687f R:c1 C:030000 (           vga) LOC_CODE=CPU1 Slot2 (16x)
[   60.412569858,5] PHB#0000:03:00.1 [EP  ] 1002 aaf8 R:00 C:040300 (multimedia-device) LOC_CODE=CPU1 Slot2 (16x)
[   60.412755922,5] PHB#0001:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=CPU1 Slot1 (8x)
[   60.412866802,5] PHB#0001:01:00.0 [EP  ] 144d a802 R:01 C:010802 (  mass-storage) LOC_CODE=CPU1 Slot1 (8x)
[   60.413042976,5] PHB#0002:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin SATA
[   60.413236298,5] PHB#0002:01:00.0 [LGCY] 1b4b 9235 R:11 C:010601 (          sata) LOC_CODE=Builtin SATA
[   60.413376374,5] PHB#0003:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin USB
[   60.413520796,5] PHB#0003:01:00.0 [EP  ] 104c 8241 R:02 C:0c0330 (      usb-xhci) LOC_CODE=Builtin USB
[   60.413672005,5] PHB#0004:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin Ethernet
[   60.413905142,5] PHB#0004:01:00.0 [EP  ] 14e4 1657 R:01 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
[   60.414016347,5] PHB#0004:01:00.1 [EP  ] 14e4 1657 R:01 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
[   60.414222156,5] PHB#0004:01:00.2 [EP  ] 14e4 1657 R:01 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
[   60.414373675,5] PHB#0005:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..02 SLOT=BMC
[   60.414454930,5] PHB#0005:01:00.0 [ETOX] 1a03 1150 R:04 C:060400 B:02..02 LOC_CODE=BMC
[   60.414582932,5] PHB#0005:02:00.0 [PCID] 1a03 2000 R:41 C:040000 (         video) LOC_CODE=BMC
[   60.424173661,5] IPMI: Resetting boot count on successful boot
[   60.424270580,5] INIT: Waiting for kernel...
[   73.210357140,3] STB: BOOTKERNEL verification FAILED. log=0xffffffffffff8160
[   73.210954847,5] INIT: 64-bit LE kernel discovered
[   73.358043529,5] INIT: Starting kernel at 0x20011000, fdt at 0x306f65d0 217248 bytes
[   74.469358228,3] LPC[000]: Got SYNC no-response error. Error address reg: 0xd0010080
[   74.469369652,6] IPMI: dropping non severe PEL event
[   74.469419009,7] UART: IRQ functional !
[    4.135656] IMC PMU (null) Register failed
[    4.995288] kAFS: failed to register: -97
[    5.393954] squashfs: SQUASHFS error: Xattrs in filesystem, these will be ignored
[    5.393979] squashfs: SQUASHFS error: unable to read xattr id index table
[    5.401894] udevd[1677]: specified group 'kvm' unknown
[    5.410641] udevd[1678]: specified group 'kvm' unknown
nvram process returned non-zero exit status
dmesg: klogctl: Operation not permitted
[   81.461400396,3] PHB#0000[0:0]:  phbRegbFirstErrorStatus = 0000000000000000
[   81.461461441,3] PHB#0000[0:0]:         phbRegbErrorLog0 = 0000000000000000
[   81.461522427,3] PHB#0000[0:0]:         phbRegbErrorLog1 = 0000000000000000
[   81.461585112,3] PHB#0000[0:0]:                PEST[1ff] = 3740002a03000000 0000000000000000
cpu 0x0: Vector: 300 (Data Access) at [c0000001f96df950]
    pc: c0080000022312fc: amdgpu_fence_process+0xc0/0x13c [amdgpu]
    lr: c0080000022312c0: amdgpu_fence_process+0x84/0x13c [amdgpu]
    sp: c0000001f96dfbd0
   msr: 900000000280b033
   dar: 8
 dsisr: 80000
  current = 0xc0000001f95b5900
  paca    = 0xc0000001ff7ff480 irqmask: 0x03 irq_happened: 0x01
    pid   = 1798, comm = kworker/0:4
Linux version 4.19.0-openpower1 (root@raptor-build-public-staging-01) (gcc version 6.5.0 (Buildroot 2019.02.1-05273-gef2bf42027)) #2 SMP Wed May 22 00:16:10 UTC 2019
enter ? for help
[c0000001f96dfc20] c008000002231640 amdgpu_fence_count_emitted+0x20/0x40 [amdgpu]
[c0000001f96dfc50] c0080000022d8fb4 amdgpu_uvd_idle_work_handler+0x90/0x160 [amdgpu]
[c0000001f96dfcb0] c0000000000943f8 process_one_work+0x204/0x32c
[c0000001f96dfd40] c000000000094af0 worker_thread+0x2d0/0x394
[c0000001f96dfdc0] c00000000009a780 kthread+0x14c/0x154
[c0000001f96dfe30] c00000000000b6b0 ret_from_kernel_thread+0x5c/0x6c

The full kernel log can be found at https://gist.github.com/runlevel5/89f87c2296f0f82f634921248309e7be
and the full OPAL log can be found at https://gist.github.com/runlevel5/94b44fa48cae84ffcb7d2f15fe4945a7



Just wondering firmware is loaded correctly or not, is there anything I could do determine the root cause?
« Last Edit: March 05, 2020, 07:04:55 pm by tle »
Faithful Linux enthusiast

My Raptor Blackbird

MPC7500

  • Hero Member
  • *****
  • Posts: 587
  • Karma: +41/-1
    • View Profile
    • Twitter
Have you read this wiki entry, already? To get the AMDGPU work under Linux.

tle

  • Sr. Member
  • ****
  • Posts: 462
  • Karma: +53/-0
    • View Profile
    • Trung's Personal Website
@MPC7500 yes I have followed the instruction

Quote
Bootloader does not show up on monitor(s) attached to a discrete GPU
Most modern discrete GPUs require firmware. As Talos™ II is aimed at a security-conscious audience, we do not currently include GPU firmware in the production firmware images. Instructions are available in the Users Guide to add firmware for your GPU to the PNOR if needed. Note that any added firmware may be able to access and modify data associated with the affected device(s); we strongly recommend you perform a security risk analysis before loading any firmware, and select open firmware where/if it is available.

If you are using a GPU that does not require firmware, or have already added any needed firmware files to the host PNOR, please ensure that the on-board VGA disable jumper (J10109) is capped. The bootloader output will preferentially show up on the on-board VGA port if it remains enabled.

Alternatively, you either use a serial console or VGA monitor / adapter to interact with the bootloader.

I would like to know what the output would be like on the host PNOR when the linux vega10 firmwares are loaded correctly.

Btw, I also did try to disable on-board VGA output by capping the jumper (J10109), nothing is displayed on my monitor.

NOTE: I have tested this graphic card on my Windows PC box, it works perfectly.
Faithful Linux enthusiast

My Raptor Blackbird

MPC7500

  • Hero Member
  • *****
  • Posts: 587
  • Karma: +41/-1
    • View Profile
    • Twitter
I don't have a discrete GPU myself, so I couldn't try it. But output will be Petitboot. That's why you need the disable jumper. Otherwise Petitboot will be shown on the AST GPU.

In the firmware directory you need two directories. amdgpu and radeon. Which files do you have in the directories?
But anyway, I would first to get it working under Linux.

Maybe there is someone else who could help.

tle

  • Sr. Member
  • ****
  • Posts: 462
  • Karma: +53/-0
    • View Profile
    • Trung's Personal Website
> In the firmware directory you need two directories. amdgpu and radeon. Which files do you have in the directories?

I only copy `amdgpu/vega10_*` files. I did not add any firmware under `radeon` because I don't need any of them.

I can confirm the card running fine under my Linux host

« Last Edit: February 05, 2020, 06:09:15 am by tle »
Faithful Linux enthusiast

My Raptor Blackbird

MPC7500

  • Hero Member
  • *****
  • Posts: 587
  • Karma: +41/-1
    • View Profile
    • Twitter
Great. Now the card only has to work at Petitboot?

tle

  • Sr. Member
  • ****
  • Posts: 462
  • Karma: +53/-0
    • View Profile
    • Trung's Personal Website
> Great. Now the card only has to work at Petitboot?
Yes.

I think this issue could be resolved with newer Linux kernel. On my host, the card was not working too (under Fedora 32 rawhide running 5.4.17 kernel). I managed to get it working by compiling my own Linux 5.5.0

Let's wait for Raptor CS to bump their firmware linux version
« Last Edit: February 15, 2020, 06:19:53 pm by tle »
Faithful Linux enthusiast

My Raptor Blackbird