Author Topic: OpenBMC reporting critical error Proc.FSI.Error.MasterDetectionFailure  (Read 2589 times)

r34per

  • Newbie
  • *
  • Posts: 27
  • Karma: +2/-0
    • View Profile
I checked the OpenBMC web interface to find it reported in being critical health with 200 high priority errors logged yesterday over the course of about 3 hours, and as far as I can tell it's the same error for all of them.

The error is
Code: [Select]
org.open_power.Proc.FSI.Error.MasterDetectionFailureWhen I expand the entry this is what it reads-
Code: [Select]
CALLOUT_DEVICE_PATH=/sys/devices/platform/gpio-fsi/fsi0/slave@00:00/raw CALLOUT_ERRNO=0 _PID=8606
Like I said they all appear to be the same error with that same message, although the PID= number is different. My blackbird appeared to have powered off at some point as well, though I don't know when(can I check that somewhere in openbmc?). I stepped away from my pc for the evening before the first was logged and forgot to shut it down for the night, and when I went to use it this morning it was not running.

I'm running void linux as the os, and I couldn't find any logs that would shed any light on it. It seems by default void does have a syslog daemon and I never bothered installing one, oops.

Is this a cause for concern, and should I put a ticket in with RCS about it? It happened once before a few weeks or so ago but I chalked it up to a fluke, I cleared the logs and it seemed to be fine.
« Last Edit: April 28, 2023, 04:50:04 pm by r34per »

Hasturtium

  • Full Member
  • ***
  • Posts: 133
  • Karma: +10/-0
    • View Profile
You may be interested to know that Raptor weighed in on their Twitter after it was pointed out to them, with a first suggestion to try reseating the CPU.

The link:
https://nitter.poast.org/RaptorCompSys/status/1652858741635665920#m

r34per

  • Newbie
  • *
  • Posts: 27
  • Karma: +2/-0
    • View Profile
I'll give that a try, thanks for the heads up!

cy384

  • Newbie
  • *
  • Posts: 10
  • Karma: +2/-0
    • View Profile
    • http://cy384.com/
I got this error in my Blackbird's BMC log when there was a brief (like one second) power outage (brownout maybe?) last week.

MPC7500

  • Hero Member
  • *****
  • Posts: 575
  • Karma: +40/-1
    • View Profile
    • Twitter
Then I would try this:
https://wiki.raptorcs.com/wiki/Troubleshooting/Guard_Partition

Otherwise, if that doesn't help, I would re-flash the BMC and OpenPOWER firmware:
https://wiki.raptorcs.com/wiki/Updating_Firmware
« Last Edit: May 05, 2023, 01:31:09 pm by MPC7500 »

r34per

  • Newbie
  • *
  • Posts: 27
  • Karma: +2/-0
    • View Profile
I got this error in my Blackbird's BMC log when there was a brief (like one second) power outage (brownout maybe?) last week.

That could be what's happening to mine too when I think about. Brief brown-outs aren't uncommon when it's windy or stormy, which it has been when it happened. I should probably invest in a UPS for it and see if it still gives me any trouble. I'll try MPC7500's suggestions too, thanks for the help!
« Last Edit: May 09, 2023, 05:17:33 pm by r34per »

Borley

  • Full Member
  • ***
  • Posts: 165
  • Karma: +14/-0
    • View Profile
Re: OpenBMC reporting critical error Proc.FSI.Error.MasterDetectionFailure
« Reply #6 on: January 03, 2024, 05:58:34 pm »
These events have also been showing in the 'Server Health' section of my BMC web panel. Just four marked from 2020, and two more recent from December 2023.

Code: [Select]
org.Open_power.Proc.FSI.Error.MasterDetectionFailure
CALLOUT_DEVICE_PATH=/sys/devices/platform/gpio-fsi/fsi0/slave@00:00/raw CALLOUT_ERRNO=0_PID=4612

My system has been fine, other than occasionally booting without properly setting RTC time to the host (same issue that ClassicHasClass has been seeing?)

If this is power outage related, that would make sense since, through 2020, I unknowingly had my Blackbird on a bad uninterruptible power supply. Then later had it in a location prone to outages before replacing the UPS.

If these logs are nothing critical, they should be safe to clear from the log?
Blackbird C1P9S01, CPU 02CY650, 2x 8GB 2666 RAM, 1024GB M.2 SSD, AMD RX 560X, 2U heatsink, 500W SFX PSU, Debian 11

Hasturtium

  • Full Member
  • ***
  • Posts: 133
  • Karma: +10/-0
    • View Profile
Re: OpenBMC reporting critical error Proc.FSI.Error.MasterDetectionFailure
« Reply #7 on: January 03, 2024, 08:04:20 pm »
If you believe you've determined the cause, I don't see a reason to keep the error messages around for further investigation. Good work troubleshooting it.

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 448
  • Karma: +34/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: OpenBMC reporting critical error Proc.FSI.Error.MasterDetectionFailure
« Reply #8 on: January 04, 2024, 12:59:54 pm »
My system has been fine, other than occasionally booting without properly setting RTC time to the host (same issue that ClassicHasClass has been seeing?)

Yeah, I'm trying to do more research on that. I'm assuming the BMC settings just got whacked given that the password was also scrambled but it still seems an odd failure mode.