Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - FlyingBlackbird

Pages: [1] 2
1
Blackbird / OpenBMC and Boot flash memory chip types?
« on: February 17, 2020, 05:27:01 pm »
According to the schematics the Boot flash memory chip should be a MT25QL512ABB8ESF-0SIT (Micron),
see the data sheet at https://www.digikey.com/product-detail/en/micron-technology-inc/MT25QL512ABB8ESF-0SIT-TR/557-1987-1-ND/9673968)

Looking at my Blackbird planar I found an (to me) unknown chip (the logo looks like the Micron logo though):

7RAI5 RWI87 258

Does anyone know this chip and could even post a data sheet here?




2
Customs clearance is a complicated process and RMA handling even more if I wanted to upgrade my machine.

Which retailers or other dealers in Europe do sell Power9 CPUs to private individuals?

3
Ubuntu did not recognize the NVMe device as installation target due to an internal bug.

Eg. @tle reported this error as
> Failed attempt to install Ubuntu Server 19.10 because the installer failed to pick up NVMe drive correctly
https://wiki.raptorcs.com/wiki/User:Tle

I have documented a workaround in my wiki user page:

https://wiki.raptorcs.com/wiki/User:FlyingBlackbird#NVMe_SSD_is_recognized_but_the_installation_seems_to_use_the_wrong_device_name_later:

4
Blackbird / How to connect HDD activity LED?
« on: February 11, 2020, 03:52:36 pm »
The User Guide for Blackbird explains on pages 18 + 19 the J9800 Front Panel pins to connect eg. LEDs like HDD activity, Power LED and NIC LEDs:

https://wiki.raptorcs.com/w/images/c/ce/C1P9S01_users_guide_version_1_0.pdf

What is unclear to me is how I can connect a usual two-pin HDD LED plug with the front panel pins since

- I can see only one Pin (#14) for the "HDD activity LED cathode" but
- no obvious PIN for the HDD activity LED anode.

So the two-pin HDD LED plug from my case has no obvious two neighbored pins to be plugged into.

1. Which front panel pin should be used for the anode of the LED?
2. Which cable could I use to split the two-pin HDD LED plug from my case into two separate pins for a reliable connection?

5
How could I check the integrity of the installed (flashed) firmware of a Blackbird (or Talos) eg. against the officially downloadable firmware files (which are cryptographically signed)?

https://wiki.raptorcs.com/wiki/Blackbird/Firmware


6
General Discussion / AMD Radeon Pro WX3200 working with Blackbird?
« on: February 09, 2020, 05:18:45 pm »
I am considering to buy a WX3200 because it is "cheap(er)" (about 220 USD) and supports newer standards like OpenGL 4.6 and Vulkan 1.1.

Has anybody tried out this GPU on a Blackbird (or Talos), does it work and with which OS and module (driver)?

https://www.amd.com/en/products/professional-graphics/radeon-pro-wx-3200

According to AMD this GPU uses Polaris architecture.
I have read rumors that it is a Polaris 12 GPU similar to the Radeon RX 550 LE.

7
User Zone / petitboot error: PHB#0005[0:5]: PHB Freeze/Fence detected !
« on: February 07, 2020, 05:28:15 pm »
My Blackbird does not boot into the petitboot menu anymore (black screen) even though I can see the execution log on screen and via the ssh console with:

Code: [Select]
ssh -p 2200 root@my.ip.address
The last available console output is a failure saying:

Code: [Select]
[   90.334100762,3] PHB#0005[0:5]: eeh_freeze_clear on fenced PHB
               XE autoconfiguration failed

I have only installed a SATA-HDD drive with a bootable Ubuntu Server 19.10 (no GPU).
I have not changed/updated any firmware.
Last thing I have done was testing a PCIe GPU (NVIDIA).
After removing the GPU card from the PCIe slot I could no longer boot into petitboot...

What is going wrong?

The relevant log is (full console output is attached as file):

Code: [Select]
--== Welcome to Hostboot hostboot-3beba24/hbicore.bin ==--

  3.06480|secure|SecureROM valid - enabling functionality
  8.29259|Booting from SBE side 0 on master proc=00050000
  8.46568|ISTEP  6. 5 - host_init_fsi
  8.93808|ISTEP  6. 6 - host_set_ipl_parms
  9.49036|ISTEP  6. 7 - host_discover_targets
 10.09037|HWAS|PRESENT> DIMM[03]=8080000000000000
 10.09038|HWAS|PRESENT> Proc[05]=8000000000000000
 10.09040|HWAS|PRESENT> Core[07]=5565000000000000
 10.49937|ISTEP  6. 8 - host_update_master_tpm
 10.52369|SECURE|Security Access Bit> 0x0000000000000000
 10.52370|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
...
 57.11902|ISTEP 21. 2 - host_verify_hdat
 57.20205|ISTEP 21. 3 - host_start_payload
[   58.179010391,5] OPAL skiboot-c81f9d6 starting...
[   58.179013552,7] initial console log level: memory 7, driver 5
[   58.179015737,6] CPU: P9 generation processor (max 4 threads/core)
...
[   65.266762790,5] PHB#0000:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=CPU1 Slot2 (16x)
 Petitboot (v1.10.3-pdd2d545)
 ──────────────────────────────────────────────────────────────────────────────

  System information
  System configuration
  System status log
  Language
  Rescan devices
  Retrieve config from URL
  Plugins (0)
 *Exit to shell           










 ──────────────────────────────────────────────────────────────────────────────
 Enter=accept, e=edit, n=new, x=exit, l=language, g=log, h=help
 Info: Waiting for device discovery[   85.086133287,3] PHB#0005[0:5]: PHB Freeze/Fence detected !
[   85.086197573,3] PHB#0005[0:5]:             PCI FIR=2000000000000000
[   85.086249297,3] PHB#0005[0:5]:         PCI FIR WOF=2000000000000000
[   85.086289203,3] PHB#0005[0:5]:            NEST FIR=0000800000000000
[   85.086354836,3] PHB#0005[0:5]:        NEST FIR WOF=0000800000000000
[   85.086394899,3] PHB#0005[0:5]:            ERR RPT0=0000000000000001
[   85.086489826,3] PHB#0005[0:5]:            ERR RPT1=0000000000000000
[   85.086534460,3] PHB#0005[0:5]:             AIB ERR=0000200000000000
[   85.086941635,3] PHB#0005[0:5]:                  brdgCtl = 00000002
[   85.087002150,3] PHB#0005[0:5]:             deviceStatus = 00200020
[   85.087036852,3] PHB#0005[0:5]:               slotStatus = 00402000
[   85.087081358,3] PHB#0005[0:5]:               linkStatus = a8120008
[   85.087137004,3] PHB#0005[0:5]:             devCmdStatus = 00100107
[   85.087181127,3] PHB#0005[0:5]:             devSecStatus = 00000000
[   85.087239088,3] PHB#0005[0:5]:          rootErrorStatus = 00000000
[   85.087285589,3] PHB#0005[0:5]:          corrErrorStatus = 00000000
[   85.087325009,3] PHB#0005[0:5]:        uncorrErrorStatus = 00000000
[   85.087370016,3] PHB#0005[0:5]:                   devctl = 00000020
[   85.087419580,3] PHB#0005[0:5]:                  devStat = 00000020
[   85.087466277,3] PHB#0005[0:5]:                  tlpHdr1 = 00000000
...
[   85.088610802,3] PHB#0005[0:5]:       phbRxeArbErrorLog1 = 0000000000000000
  [Disk: sda2 / ef49aa17-bb70-4fea-a8fc-29e235f7ab9f]
    Ubuntu, with Linux 5.3.0-26-generic (recovery mode)
    Ubuntu, with Linux 5.3.0-26-generic
    Ubuntu, with Linux 5.3.0-29-generic (recovery mode)
    Ubuntu, with Linux 5.3.0-29-generic
    Ubuntu
[   85.088655011,3] PHB#0005[0:5]:     phbRxeMrgErrorStatus = 0000000000000001
...
[   85.089573601,3] PHB#0005[0:5]:                PEST[0ff] = 3740002a01000000 0000000000000000
 [enP4p1s0f2] Probing from base tftp://192.168.178.1/pxelinux.cfg/[   90.311273403,3] PHB#0005[0:5]: PHB Freeze/Fence detected !
[   90.311357669,3] PHB#0005[0:5]:             PCI FIR=2000000000000000
...
[   90.315282185,3] PHB#0005[0:5]:         phbRegbErrorLog1 = 0001020000000000
[   90.315338900,3] PHB#0005[0:5]:                PEST[000] = 8000000000000000 8000000000000000
[   90.315413179,3] PHB#0005[0:5]:                PEST[001] = 8000000000000000 8000000000000000
[   90.315491213,3] PHB#0005[0:5]:                PEST[002] = 8000000000000000 8000000000000000
...
[   90.333937493,3] PHB#0005[0:5]:                PEST[0fe] = 8000000000000000 8000000000000000
[   90.334011680,3] PHB#0005[0:5]:                PEST[0ff] = b740002a01000000 8000000000000000
[   90.334100762,3] PHB#0005[0:5]: eeh_freeze_clear on fenced PHB
               XE autoconfiguration failed

PS: I have logged in into OpenBMC via SSH and see these strange error messages that may be related:

Code: [Select]
root@blackbird:~# journalctl | grep fail
May 10 19:37:25 blackbird kernel: g_mass_storage 1e6a0000.usb-vhub:p1: failed to start g_mass_storage: -22
May 10 19:37:27 blackbird systemd-udevd[789]: Process 'mtd_probe /dev/mtd2ro' failed with exit code 1.
May 10 19:37:27 blackbird systemd-udevd[790]: Process 'mtd_probe /dev/mtd3ro' failed with exit code 1.
May 10 19:37:27 blackbird systemd-udevd[837]: Process 'mtd_probe /dev/mtd4ro' failed with exit code 1.
May 10 19:37:27 blackbird systemd-udevd[792]: Process 'mtd_probe /dev/mtd5ro' failed with exit code 1.
May 10 19:37:28 blackbird systemd-udevd[788]: Process 'mtd_probe /dev/mtd0ro' failed with exit code 1.
May 10 19:37:28 blackbird systemd-udevd[791]: Process 'mtd_probe /dev/mtd1ro' failed with exit code 1.
May 10 19:37:28 blackbird systemd-udevd[836]: Process 'mtd_probe /dev/mtd6ro' failed with exit code 1.
May 10 19:37:29 blackbird kernel: A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check.
May 10 19:37:31 blackbird kernel: A link change request failed with some changes committed already. Interface sit0 may have been left with an inconsistent configuration, please check.
Feb 03 22:05:40 blackbird kernel[1052]: [    3.810720] g_mass_storage 1e6a0000.usb-vhub:p1: failed to start g_mass_storage: -22
Feb 03 22:05:43 blackbird kernel[1052]: [   22.397690] A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check.
Feb 03 22:05:43 blackbird kernel[1052]: [   24.051627] A link change request failed with some changes committed already. Interface sit0 may have been left with an inconsistent configuration, please check.
Feb 07 21:45:05 blackbird systemd[1]: Starting Stop the ethernet link failover...
Feb 07 21:45:07 blackbird systemd[1]: Started Stop the ethernet link failover.

PS2: This question is more precise follow-up to that question:
         https://forums.raptorcs.com/index.php?action=post;topic=49.0;last_msg=473

8
I have played around with a Geforce GTX 1050 Ti in the PCIe port but did not manage to use it with nouveau, so I have removed the card from the mother board again.

Now I can see the boot process via AST at my display until it finishes (progress bar at them bottom) and then the screen becomes black.

Obviously also no operating system is booted automatically (after the usual 10 seconds timout) - playing with the cursor keys and pressing enter does also not have any effect.

How can I diagnose this or reset petitiboot?

I have not changed any Blackbird firmware so it should work OOTB...

Edit: I have already tried the usual cold boot (power off) and unplugging any devices (PCIe and SATA)

9
When opening the shell in petitboot I can browse the dmesg results for failure. I have found one message saying:

Code: [Select]
[   4.145855] IMC Unknown Device type
[   4.145857] IMC PMU (null) Register failed
I was wondering if this is critical and found a kernel patch to suppress this message (so it must be OK):

https://lkml.org/lkml/2019/6/20/1117

Are there plans to update petitboot & friends to get rid of this noise (and possibly fix other issues)?

THX :-)


10
I am playing around with an "old" NVIDIA GTX 1050 Ti in my Blackbird system with Fedora 31 but I do not manage to create a working xorg.conf.

RPMFusion does not contain the proprietary NVIDIA drivers since the ppc64le architecture seems not to be supported by NVIDIA for this GPU.

nouveau is available and I assume this driver should work on Power9

Code: [Select]
Xorg -config /etc/X11/xorg.conf.d/21-gpu-driver.confto test my xorg.conf at the console complains with messages like

Code: [Select]
Fatal server error:
(EE) no screens found(EE)
(EE) Server terminated with error (1).

so it would be helpful if you could provide a working xorg.conf that uses nouveau so that I can focus on getting AST2500 out of the way (from my display)...

11
From time to time during the boot phase my SATA devices (HDD and BD reader) are disabled due to initialisation problems:

Code: [Select]
[    0.990587] ata4: SATA max UDMA/133 abar m2048@0x600c100000000 port 0x600c100000280 irq 30
[    1.487797] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    1.491980] ata4.00: ATAPI: ASUS    BW-16D1HT, 3.10, max UDMA/133
[    1.499775] ata4.00: configured for UDMA/133
[   97.577022] ata4: softreset failed (1st FIS failed)
[  147.576656] ata4: reset failed, giving up
[  147.576658] ata4.00: disabled

The NVMe SSD is still working (always).

Rebooting does not help, sometimes even a cold start does not help (so the problems smells like a boot time Linux problem).

After this error also petitboot does no longer recognize the SATA devices (even when I choose the menu item "rescan devices").

I could reproduce this problem with Ubuntu Server 19.10 (kernel 5.3.x) as well as with Fedora 31 with a newer kernel (5.4.x)

What is the reason for that and how can I fix this?

BTW: There is a wiki entry at voidlinux bit it does not explain the background (reasons + impact):

https://wiki.voidlinux.org/Frequently_Asked_Questions#How_to_get_rid_of_.22ataN:_softreset_failed_.28device_not_ready.29.22_.3F

12
The Raptor wiki mentions

Quote
All AMD GPUs currently have DMA issues (limited to 32-bit, which can cause crashes) due to missing Linux kernel support for DMA windows between 33 and 63 bits in length.
The root cause is GPU vendors (and occasionally some non-GPU vendors) cutting costs and only including 40-bit capable (Intel-style) DMA controllers.
A compatibility mode is expected to be included in Linux 5.4 and above that will resolve this issue

https://wiki.raptorcs.com/wiki/POWER9_Hardware_Compatibility_List/PCIe_Devices#Graphics_Cards

What I would like to understand:
  • How can I diagnose this (am I affected)?
  • What is the impact of this issue (crashes under which conditions)?

13
User Zone / Virtual machine to run x86 software on ppc64le a host
« on: February 01, 2020, 12:21:31 pm »
Is there any working virtual machine software available?

I saw a wiki page at voidlinux but I am not sure which features are exactly supported on Power9...

https://wiki.voidlinux.org/VirtualBox

14
I am using the statistical programming language R (which is available for ppc64le as ready-to-use package) a lot for data management and analytics
but the development IDE RStudio Desktop (https://rstudio.com/products/rstudio/download/#download) is still not available.

Could anybody compile RStudio successfully on ppc64le?

BTW: RStudio Server seems to be available (I did not test it so far) but "only" dockerized:

https://github.com/ppc64le/build-scripts/tree/master/rstudio
https://support.rstudio.com/hc/en-us/articles/236077788-Running-RStudio-Server-on-IBM-Power8


15
Did anybody of you manage to use a NVMe SSD together with a SATA HDD (booting Linux from the SSD)?

The combination of
  • Samsung EVO Plus 970 TB NVMe SSD in the PCIe x8 slot
  • Seagate IronWolf Pro 8 TB (ST8000NE0004) SATA III HDD in SATA-2
  • Asus BW-16D1HT Retail BluRay Writer in SATA-1
  • Fedora Server 31 installed on the SSD
caused many drop-outs (= disabled) of the SATA devices (HDD and BluRay) during the boot phase of Linux.

Without an SSD I can always boot successfully Linux from the same HDD

I have also tried another PCIe to M.2 NVMe hardware with the same problems
  • RaidSonic ICY BOX IB-PCI214M2-HSL M.2 to PCIe adapter and
  • Delock M.2 PCI Express x4 card
to exclude an adapter incompatibility issue.

petitboot always recognized SATA devices until booting of Linux causes the SATA devices to be disabled.
After rebooting (without power off) petitboot then also did not show the SATA devices anymore until I power-off and restart.
"Rescan devices" in petitboot does not help...

Fedora Server 31 log with the SATA drop-outs shown in dmesg:

Code: [Select]
[    0.990585] ata3: SATA max UDMA/133 abar m2048@0x600c100000000 port 0x600c100000200 irq 30
[    1.487812] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.507342] ata3.00: ATA-10: ST8000NE0004-1ZF11G, EN01, max UDMA/133
[    1.507345] ata3.00: 15628053168 sectors, multi 16: LBA48 NCQ (depth 32), AA
[    6.557731] ata3.00: qc timeout (cmd 0xec)
[    6.557737] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[    6.557738] ata3.00: revalidation failed (errno=-5)
[   16.557020] ata3: softreset failed (1st FIS failed)
[   26.557020] ata3: softreset failed (1st FIS failed)
[   61.556654] ata3: softreset failed (1st FIS failed)
[   61.556656] ata3: limiting SATA link speed to 3.0 Gbps
[   66.556654] ata3: softreset failed (1st FIS failed)
[   66.556656] ata3: reset failed, giving up
[   66.556658] ata3.00: disabled

[    0.990587] ata4: SATA max UDMA/133 abar m2048@0x600c100000000 port 0x600c100000280 irq 30
[    1.487797] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    1.491980] ata4.00: ATAPI: ASUS    BW-16D1HT, 3.10, max UDMA/133
[    1.499775] ata4.00: configured for UDMA/133
[   97.577022] ata4: softreset failed (1st FIS failed)
[  107.577021] ata4: softreset failed (1st FIS failed)
[  142.576654] ata4: softreset failed (1st FIS failed)
[  147.576654] ata4: softreset failed (1st FIS failed)
[  147.576656] ata4: reset failed, giving up
[  147.576658] ata4.00: disabled

Pages: [1] 2