Author Topic: Potential Aspeed slow failure?  (Read 911 times)

Badt

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
    • View Profile
Potential Aspeed slow failure?
« on: September 03, 2024, 10:24:27 am »
Our production Blackbird had suddenly suffered a breakdown when the BMC had suddenly requested IPMI shutdown. We were able to reboot, and the normal booting sequence completed successfully, for the exact same event to happen 10 minutes later.

And it persists.

I had looked into BMC dmesg log to find this:

Code: [Select]
[  294.400092] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 26 (F20) for 1e780000.gpio:306
[  294.400136] Want SCU90[0x00000002]=0x1, got 0x0 from 0x063F0000
[  294.400158] Want SCU8C[0x00000200]=0x1, got 0x0 from 0x00000001
[  294.400171] Want SCU70[0x00200000]=0x1, got 0x0 from 0xF1105206
[  648.795152] aspeed-i2c-bus 1e78a440.i2c-bus: irq handled != irq. expected 0x00001010, but was 0x00000010
[  648.805898] aspeed-i2c-bus 1e78a440.i2c-bus: irq handled != irq. expected 0x00001001, but was 0x00000001
[  648.817794] aspeed-i2c-bus 1e78a440.i2c-bus: irq handled != irq. expected 0x00001010, but was 0x00000010
[  649.661265] aspeed-i2c-bus 1e78a440.i2c-bus: irq handled != irq. expected 0x00001001, but was 0x00000001

This issue looks similar to https://forums.raptorcs.com/index.php?topic=377.0

However, in our case the system is plugged into a UPS, and there hasn’t been electric failures of any kind. To be sure, we had swapped out the chassis PSU for a brand-new one we had laying around, and bypassed the UPS completely. Unfortunately, it did absolutely nothing. We have also tried re-flashing the self-compiled PNOR as well as the latest release & manually upgrading BMC firmware, and unfortunately it had no effect. Note: we did upgrade to the latest-release PNOR a few months back, although I think it probably had nothing to do with the issue at hand.

Should we proceed with dumping & re-flashing the FPGA now? I hate that it looks just like the issue linked above, but the remedies aren't working.

Best

P.S. Is jerry-rigged serprog-based programmer good enough for at least dumping the FPGA flash for inspection?
« Last Edit: September 03, 2024, 10:46:38 am by Badt »

carlosgonz

  • Newbie
  • *
  • Posts: 5
  • Karma: +0/-0
    • View Profile
Re: Potential Aspeed slow failure?
« Reply #1 on: September 03, 2024, 12:25:30 pm »
Thats is a known bug on old Gnu Linux version in BMC, so i not remember well which Gnu Linux version it fixed, but it is. I hope Raptor team release a new BMC version with newer Gnu Linux version like v6.0+

Thanks.
« Last Edit: September 03, 2024, 12:29:52 pm by carlosgonz »
Blackbird  Rv1.02

Badt

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
    • View Profile
Re: Potential Aspeed slow failure?
« Reply #2 on: September 05, 2024, 08:53:41 am »
Hey, so IPMI shutdown issue is kind of gone..? The BMC log is still full of these error messages, but for some reason it doesn't kill the system any more; perhaps one of the PNOR/BMC rounds of flashes did it, but not any one in particular. It took a few tries. For the time being, we're back to operational. I'd really like to mitigate this issue in the future.

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 462
  • Karma: +35/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: Potential Aspeed slow failure?
« Reply #3 on: September 05, 2024, 11:46:52 am »
Glad to hear it self-resolved, at least.