1
Blackbird / Re: Onboard nic not working
« on: February 04, 2023, 10:51:22 am »
So, this is what the state as I see it:
- Linux appeases to be unable to acquire a lock in the hardware. (see https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/broadcom/tg3.c#L15439)
- The APE or RX firmware in the part has priority over Linux
- The APE or RX firmware likely has acquired the lock, and failed to release it due to an infinite loop / PHY hardware not responding how the firmware expected it to.
(Note: If you have an older version of the firmware, the RX firmware acquired the lock for the endpoint it was running on, however this failed to work in some cases, and is disabled in newer firmware. The APE firmware still grabs a lock as well, however this should be for one port only, not all of them, and so I don't expect this to be where the issue is)
All that said, when the lock failed to be acquired, Linux stopped initializing the device, and so you don't get the eth port showing up.
This is also why fwupd and ethtool failed to work - they depend on the tg3 driver being loaded.
To fix this, you'll need to do one of a two things:
- Try connecting to a different external device like a switch before going to the router and see if you get a different result. This is probably the simplest option if it works (unlikely due to all ports failing to initialize)
- If you're unable to get Linux to ever see the device, you'll need to install the development tools for the OSS firmware and at that point there are a couple of options.
(1) You can try loading the proprietary firmware image and see if that works. if it does *not* work, you'll need to RMA with Raptor as it's a hardware issue. If, on the other had, the proprietary firmware works, you can leave it as-is, or, we can flash the latest oss fw release and continue to (2)
(2) You/We can try debugging the issue. You'll need to provide dumps of the registers using the bcmregtool utility. That'll let me know if the firmware is spinning / locked up, the firmware version, etc and if there's an easy fix.
- Linux appeases to be unable to acquire a lock in the hardware. (see https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/broadcom/tg3.c#L15439)
- The APE or RX firmware in the part has priority over Linux
- The APE or RX firmware likely has acquired the lock, and failed to release it due to an infinite loop / PHY hardware not responding how the firmware expected it to.
(Note: If you have an older version of the firmware, the RX firmware acquired the lock for the endpoint it was running on, however this failed to work in some cases, and is disabled in newer firmware. The APE firmware still grabs a lock as well, however this should be for one port only, not all of them, and so I don't expect this to be where the issue is)
All that said, when the lock failed to be acquired, Linux stopped initializing the device, and so you don't get the eth port showing up.
This is also why fwupd and ethtool failed to work - they depend on the tg3 driver being loaded.
To fix this, you'll need to do one of a two things:
- Try connecting to a different external device like a switch before going to the router and see if you get a different result. This is probably the simplest option if it works (unlikely due to all ports failing to initialize)
- If you're unable to get Linux to ever see the device, you'll need to install the development tools for the OSS firmware and at that point there are a couple of options.
(1) You can try loading the proprietary firmware image and see if that works. if it does *not* work, you'll need to RMA with Raptor as it's a hardware issue. If, on the other had, the proprietary firmware works, you can leave it as-is, or, we can flash the latest oss fw release and continue to (2)
(2) You/We can try debugging the issue. You'll need to provide dumps of the registers using the bcmregtool utility. That'll let me know if the firmware is spinning / locked up, the firmware version, etc and if there's an easy fix.