Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - meklort

Pages: [1] 2
1
Blackbird / Re: Onboard nic not working
« on: February 04, 2023, 10:51:22 am »
So, this is what the state as I see it:

- Linux appeases to be unable to acquire a lock in the hardware. (see https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/broadcom/tg3.c#L15439)
- The APE or RX firmware in the part has priority over Linux
- The APE or RX firmware likely has acquired the lock, and failed to release it due to an infinite loop / PHY hardware not responding how the firmware expected it to.
(Note: If you have an older version of the firmware, the RX firmware acquired the lock for the endpoint it was running on, however this failed to work in some cases, and is disabled in newer firmware. The APE firmware still grabs a lock as well, however this should be for one port only, not all of them, and so I don't expect this to be where the issue is)

All that said, when the lock failed to be acquired, Linux stopped initializing the device, and so you don't get the eth port showing up.
This is also why fwupd and ethtool failed to work - they depend on the tg3 driver being loaded.

To fix this, you'll need to do one of a two things:
- Try connecting to a different external device like a switch before going to the router and see if you get a different result. This is probably the simplest option if it works (unlikely due to all ports failing to initialize)
- If you're unable to get Linux to ever see the device, you'll need to install the development tools for the OSS firmware and at that point there are a couple of options.
(1) You can try loading the proprietary firmware image and see if that works. if it does *not* work, you'll need to RMA with Raptor as it's a hardware issue. If, on the other had, the proprietary firmware works, you can leave it as-is, or, we can flash the latest oss fw release and continue to (2)
(2) You/We can try debugging the issue. You'll need to provide dumps of the registers using the bcmregtool utility. That'll let me know if the firmware is spinning / locked up, the firmware version, etc and if there's an easy fix.

2
Blackbird / Re: Onboard nic not working
« on: February 01, 2023, 10:29:21 pm »
Ah, that's very interesting. The tg3.c source code has ENODEV being reported in tg3_phy_probe. That routine has to cooperate with the APE firmware for everything to work right, so now I'm wondering what the APE firmware log is showing.

In any case, that's not expected, can you provide more details on your setup? What are you connecting to? What cables are you using? etc.

If you are willing to setup a dev environment, you can get a dump of the ape console and see what the firmware is trying to do. You'd need to compile the ape console utility in the firmware repo here: https://github.com/meklort/bcm5719-fw or use the prebuilt binary included in the release package, if you don't want to setup the environment yourself).

It'd also be worth connecting the BMC port to a switch / router / etc and seeing if you're able to talk to it before the OS starts up. My first guess is that the APE firmware is getting locked up somewhere (possibly also talking to the PHY), and not releasing the PHY resources for Linux.

In any case, the APE console output would be very useful to debug it.

3
Blackbird / Re: Onboard nic not working
« on: January 31, 2023, 01:40:09 am »
Are they showing up at all past lspci (good that it sees them)? What does dmesg say? How about ip or ifconfig? Provide as many details as you are willing to.

This should give some details (may need sudo depending on the distribution):
Code: [Select]
dmesg | grep tg3 | sed s/"MAC address.*"/"MAC Address <mac removed>"/g

For reference, this is what I see, showing the three ports / network interfaces and that port 2 is online (the bmc port):
Code: [Select]
meklort@bb ~> dmesg | grep tg3 | sed s/"MAC address.*"/"MAC Address <mac removed>"/g
[    2.271519] tg3 0004:01:00.0: enabling device (0140 -> 0142)
[    2.306172] tg3 0004:01:00.0 eth0: Tigon3 [partno(BCM95719) rev 5719001] (PCI Express) MAC Address <mac removed>
[    2.306206] tg3 0004:01:00.0 eth0: attached PHY is serdes (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[    2.306236] tg3 0004:01:00.0 eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[    2.306254] tg3 0004:01:00.0 eth0: dma_rwctrl[00000000] dma_mask[64-bit]
[    2.306421] tg3 0004:01:00.1: enabling device (0140 -> 0142)
[    2.336139] tg3 0004:01:00.1 eth1: Tigon3 [partno(BCM95719) rev 5719001] (PCI Express) MAC Address <mac removed>
[    2.336173] tg3 0004:01:00.1 eth1: attached PHY is serdes (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[    2.336203] tg3 0004:01:00.1 eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[    2.336220] tg3 0004:01:00.1 eth1: dma_rwctrl[00000000] dma_mask[64-bit]
[    2.336402] tg3 0004:01:00.2: enabling device (0140 -> 0142)
[    2.374594] tg3 0004:01:00.2 eth2: Tigon3 [partno(BCM95719) rev 5719001] (PCI Express) MAC Address <mac removed>
[    2.374621] tg3 0004:01:00.2 eth2: attached PHY is serdes (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
[    2.374650] tg3 0004:01:00.2 eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
[    2.374677] tg3 0004:01:00.2 eth2: dma_rwctrl[00000000] dma_mask[64-bit]
[    2.377649] tg3 0004:01:00.1 enP4p1s0f1: renamed from eth1
[    2.449801] tg3 0004:01:00.0 enP4p1s0f0: renamed from eth0
[    2.529154] tg3 0004:01:00.2 enP4p1s0f2: renamed from eth2
[  125.902784] tg3 0004:01:00.2 enP4p1s0f2: Link is up at 1000 Mbps, full duplex
[  125.902806] tg3 0004:01:00.2 enP4p1s0f2: Flow control is off for TX and off for RX
[  125.902818] tg3 0004:01:00.2 enP4p1s0f2: EEE is enabled

4
Blackbird / Re: Onboard nic not working
« on: January 30, 2023, 12:24:56 am »
You don't see something like the following via fwupdmgr (note that I'm running an unreleased version)?
Code: [Select]
IBM C1P9S01 REV 1.01

├─NetXtreme BCM5719 Gigabit Ethernet PCIe:
│     Device ID:          7de5ffdca08fa52d95fd4bb42aa5d07a4b35d2dd
│     Current version:    0.6.38
│     Vendor:             Broadcom Inc. and subsidiaries (PCI:0x14E4)
│     Release Branch:     oss-firmware
│     GUIDs:              30fe13b6-aa73-5d8c-a19f-c7b600f0117a ← PCI\VEN_14E4&DEV_1657
│                         cd40a56d-7fbf-57ef-9502-50e9f35d9b1b ← PCI\VEN_14E4&DEV_1657&REV_00
│                         5d8b12bf-1973-58fc-8b31-3e50fe31954d ← PCI\VEN_14E4&DEV_1657&SUBSYS_14E41981
│                         df899e77-b408-51b3-acd3-e91343344bfb ← PCI\VEN_14E4&DEV_1657&SUBSYS_14E41981&REV_00
│     Device Flags:       • Updatable
│                         • Supported on remote server
│                         • Needs a reboot after installation
│                         • Cryptographic hash verification is available
│                         • Device will backup firmware before installing
│                         • Unsigned Payload
│   
├─88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller:
│     Device ID:          024ec185fcba9289f4336862423686455165d68a

In any case, if that doesn't show anything, can you run "ethtool  -I enP4p1s0f0" (enP4p1s0f0 is correct on Fedora, may be different on your OS):
Code: [Select]
root@bb /h/meklort [1]# ethtool  -i enP4p1s0f0
driver: tg3
version: 6.2.0-0.rc4.31.fc38.ppc64le
firmware-version: stage1-0.6.38 NCSI v0.6.0.38
expansion-rom-version:
bus-info: 0004:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

Finally, do note that there's shouldn't be a need for a crossover cable - the port support (and has enabled) Auto-MDI/MDIX, so if you don't want to plug it into your switch, you should be able to plug a computer directly into it and still have it negotiate properly.


5
Blackbird / Re: Onboard nic not working
« on: January 29, 2023, 03:24:54 pm »
Can you check to the firmware version on the NIC? This can be done using fwupd on your machine (fwupdmgr get-devices).

Older firmware version had some issues with starting up. If it's out of date (latest is 0.6.12), please update with "fwupdmgr update" and check again. Note that you'll need to completely disconnect power from the machine for about 30 seconds after the update before the new firmware really starts. A good order is (1) update, (2) reboot, (3) shutdown, (4) disconnect power for 30 seconds (until all lights on the motherboard are off), and (5) reconnect and boot up again.

- Which network ports have you tried? The port above the usb connector is shared between the BMC and the OS. You might try a different port to see if there's a different behavior.
- Was the BMC able to get an ip address before the OS started? This can be checked in your router / dhcp server if a lease was given. If the BMC worked then this may be a handoff issue to the OS.
- Can you also confirm that you are not trying to use jumbo frames? The firmware has an issue with jumbo frames that I haven't looked at yet, but the symptoms are that the the firmware will reset the connection once every couple of seconds - you'll still get some packets in and out, but only for a few seconds at a time.


6
Firmware / Re: network card to reduce attack surface?
« on: August 19, 2022, 09:46:58 am »
Take a look at the Talos2 schematics: T2P9D01, v3.8, page 96.  The NCSI management interface for the ethernet chip goes to the BMC.  This is essentially the management interface for the tiny switch inside the BCM5719.  It can select the filtering criteria (vlan, mac address, etc) for which packets get passed to the BMC.  Or it can set no filtering at all in order to let the BMC sniff all traffic.

Think about it: both "tcpdump" and "ip link set dev address $MAC_ADDRESS" work on the BMC.  You can sniff traffic and inject packets on either ethernet port from the BMC, without "loading a custom host OS image".
Yes, I understand how NC-SI works and how it is used to connect the BMC and the NIC.

That said, there is *no* switch in the NIC, and it doesn't operate the way you discribed. The NC-SI traffic is handheld by firmware and not by hardware.

You are correct that NC-SI allows setting the VLAN, MAC, etc, however this is done by sending a command to the APE on the BCM5719, which then sets the appropriate registers to enable it.
Please se here for how packets from the BMC are handled: https://github.com/meklort/bcm5719-fw/blob/main/ape/main.c#L193
and here for how the NC-SI packets are handled, such as the set MAC command: https://github.com/meklort/bcm5719-fw/blob/main/libs/NCSI/ncsi.c#L530
You may also note that the open source firmware does not support setting a different VLAN as well, butin any case, as the firmware is open source, you can always explicitly lock it down to a specific MAC and VLAN.

In this case, there is no checking that the requested MAC is different from the host MAC, and so I will add that in a future release, as it's a very good point.

7
Firmware / Re: network card to reduce attack surface?
« on: August 18, 2022, 09:46:57 pm »
The original post is a bit old, but since there now activity, I figured I should add a few clarifications.

so to clarify, the BMC on the blackbird is isolated and not accessible if one has network access on the other two ethernet ports. correct?

By all accounts that’s correct- the official documentation says that much, and it’s repeated on several pages on the official wiki.

To clarify this a bit more, as it's partially correct:
- The proprietary firmware, if you are still on it, technically allows all ports to be used for network traffic on the BMC. The latest BMC firmware is configured to select only the correct port for the Blackbird of Talos II, however in some cases, this could malfunction. It is also relatively easy to reconfigure this one connected to the BMC.
- If you are using the open source firmware, this is configured to only connect to the specified port at build time, as such, the BMC cannot communicate on a separate port mistakenly. There are of course ways for the BMC to turn on the host, and then instruct the POWER9 CPU to flash the NIC firmware, but that's not something that the BMC can do as easily as the option with the proprietary code.

I'm not sure about the Blackbird, but please note that this is most definitely not true for Talos2.

The Talos2 BMC is connected directly to the management interface of the two-port ethernet chip, and there is nothing you can do to prevent an attacker with control of the BMC from having total control over both network adapters.

All of my Talos2 machines use separate PCIe cards for networking as a result of this unfortunate situation.  Hopefully Arctic Tern will eventually allow me to re-pinstrap the BMC and hold its reset pin in the asserted state so I can go back to using the on-motherboard Ethernet ports.
Technically speaking, the only way that the BMC can take control of the network controller is by loading a custom host OS image, that then talks to the device. The BMC does not have a way to re-flash the network card firmware directly, nor does it have a way to load new firmware on the device directly. This can only be done from the host, which the BMC does have full control of. Your model of adding a second Ethernet card does make things harder, but the BMC can still take control of this by replacing the host image.
The general threat model for the Talos II and blackbird is that the BMC is in control of the host, and not the other way around, ado so this is how things are designed. The BMC can always compromise the host.

Is there a simpler way to achieve this? Perhaps a BMC configuration trick that disables NC-SI?
You can disable network access to the BMC a couple of ways:
- Remove the firmware on the NIC, specifically the APE. This will disable the BMC from being able to access the network (without first re flashing the firmware)
- Build a custom version of the firmware that disables NC-SI. At this point, there's not much benefit of running any firmware, but it's still an option.
- Use the open source firmware and *don't* use port 0 (Talos II), or port 2 (blackbird), as those are the ports configured for NC-SI access to the network.

8
For those of you interested in using the open source firmware instead of the proprietary firmware for the BCM5719 network card, the latest stable release (0.6.12) is available via fwupd and LVFS. My understanding is that this version will be included in future shipments from RCS as well.

If you're on a Linux distribution that includes fwupd 1.5.2+, you can switch using the following command:
Code: [Select]
sudo fwupdmgr switch-branch
Note that after the update, it's best to completely power cycle the system - that is to say that the host (POWER9), BMC (AST2500), and the NIC (BCM5719) all have to be restarted, and so powering off the machine and unplugging it for a good 30 seconds is your best option after the update completes

You can find the source code here: https://github.com/meklort/bcm5719-fw

Special thanks to:
  • Hugo Landau for his effort on reverse engineering the proprietary firmware. You can read about that here: https://www.devever.net/~hl/ortega
  • Richard Hughes for fwupd and his changes to support flashing the bcm5719.

9
General CPU Discussion / Re: CPU Power 9 8 core più veloci e prestanti
« on: November 24, 2020, 10:30:58 pm »
02CY977 DD2.2 model or equivalent DD2.3
Just a quick comment, but please be aware that the CPUs you mentioned are not supported out of the box by the existing firmware on the boards:
https://git.raptorcs.com/git/talos-xml/tree/wofdata

There's nothing stopping you from adding the needed tables, but the 190W 8 core CPU will *not* turbo unless if you also install the correct WOF tables into the PNOR.
For reference, the relevant WOF tables can be found upstream here: https://github.com/open-power/WOF-Tables
If the tables are not installed, the CPU will not work in turbo mode and will be quite a bit slower.

(Note: I'm personally running 12 core parts that are normally rated for 105W but have increased the TDP to ~145W. Modifying the tables isn't too much work, and I expect RCS would be open to including additional tables in the firmware image for future release, but you'd have to ask RCS to be certain.)

10
Blackbird / Re: RAM slot B1 not showing
« on: July 16, 2020, 09:25:42 pm »
It sounds like the RAM was GUARDed out by the firmware. It probably detected some issue with it and disabled it.

You can try clearing the GUARD partition via the BMC when the host off:
Code: [Select]
pflash -P GUARD -c
This won't fix the reason why it was GUARDed out, but it should re-enable the slot.

11
User Zone / Re: Graphics Card install
« on: December 15, 2019, 09:56:39 pm »
@meklort: Thanks for the patch. I have another question, after patching the kernel, will it be replaced in the next update of Fedora or will it remain?

I believe it would be replaced if you update to the official version from the fedora repos. You'll probably want to change the version string to something custom so that it does not get replaced. To do so, I think you'll want to set the CONFIG_LOCALVERSION value in the .config file
Code: [Select]
CONFIG_LOCALVERSION="amdgpu"


@MauryG5
You'll need to use the bold commands in the guide.


12
User Zone / Re: Graphics Card install
« on: December 12, 2019, 11:28:43 pm »
The best option would be to follow the guide each step should be listed (in bold). Some steps make take a while to complete. I'd suggest giving it a try before falling back to another method.

If you are unable to get it to work, you can try installing the following prebuilt ones from GigabytePro:
https://files.gigabyteproductions.net/srv/devel/linux-navi10/fedora/f32/try6/kernel-5.4.0-2.fc32.ppc64le/kernel-core-5.4.0-2.fc32.ppc64le.rpm
https://files.gigabyteproductions.net/srv/devel/linux-navi10/fedora/f32/try6/kernel-5.4.0-2.fc32.ppc64le/kernel-modules-5.4.0-2.fc32.ppc64le.rpm
You'll need to use
Code: [Select]
rpm -i kernel-core-5.4.0-2.fc32.ppc64le.rpmand
Code: [Select]
rpm -i kernel-modules-5.4.0-2.fc32.ppc64le.rpmNote that while this is the fedora 32 kernel, it should be OK on fedora 31.

13
User Zone / Re: Graphics Card install
« on: December 10, 2019, 10:23:40 pm »
@MauryG5
I put together something on the wiki that has instructions on building a patched kernel and installing it. You'll also need to follow the information on enabling the discrete display.

https://wiki.raptorcs.com/wiki/Enabling_Navi_10_On_Fedora_31

Let me know if you run into problems.

14
User Zone / Re: Graphics Card install
« on: December 09, 2019, 12:35:08 am »
In this case my instincts let me down.  I had drilled in my head for so long "don't use floating point in kernel space!" that I didn't even think to look for an x86-only guard around the DCN code.  I hope the patches make up for it!  ;)
No worries. I didn't really have the time this weekend to work on fixing the issue, so it was certainly good to have you work on it and on upstreaming the fixes.

They are testing the work done, they already work tell me how you can read above, as soon as they finish the tests they will tell me how to activate this beautiful GPU ...
Everything seems to be working reasonably well here on Fedora 32/Rawhide. I'll try to do a fresh Fedora 31 install here (making sure everything works) and put together a quick guide on the steps needed to get Navi 10 working in the next day or two.

15
User Zone / Re: Graphics Card install
« on: December 06, 2019, 07:03:21 pm »
@MauryG5
I was able to test the patch from madscientist159 and with some modifications I have the Radeon 5700 XT running my system. We're still working on cleaning up the patch, but Navi is now working on POWER. once we're further along, I'll test on Fedora 31 instead of Rawhide and hopefully get you a kernel build to try.

Pages: [1] 2