Raptor Computing Systems Community Forums (BETA)

Raptor Computing Systems Hardware => Blackbird => Topic started by: r34per on April 28, 2023, 04:46:46 pm

Title: OpenBMC reporting critical error Proc.FSI.Error.MasterDetectionFailure
Post by: r34per on April 28, 2023, 04:46:46 pm
I checked the OpenBMC web interface to find it reported in being critical health with 200 high priority errors logged yesterday over the course of about 3 hours, and as far as I can tell it's the same error for all of them.

The error is
Code: [Select]
org.open_power.Proc.FSI.Error.MasterDetectionFailureWhen I expand the entry this is what it reads-
Code: [Select]
CALLOUT_DEVICE_PATH=/sys/devices/platform/gpio-fsi/fsi0/slave@00:00/raw CALLOUT_ERRNO=0 _PID=8606
Like I said they all appear to be the same error with that same message, although the PID= number is different. My blackbird appeared to have powered off at some point as well, though I don't know when(can I check that somewhere in openbmc?). I stepped away from my pc for the evening before the first was logged and forgot to shut it down for the night, and when I went to use it this morning it was not running.

I'm running void linux as the os, and I couldn't find any logs that would shed any light on it. It seems by default void does have a syslog daemon and I never bothered installing one, oops.

Is this a cause for concern, and should I put a ticket in with RCS about it? It happened once before a few weeks or so ago but I chalked it up to a fluke, I cleared the logs and it seemed to be fine.
Title: Re: OpenBMC reporting critical error Proc.FSI.Error.MasterDetectionFailure
Post by: Hasturtium on May 01, 2023, 09:22:29 am
You may be interested to know that Raptor weighed in on their Twitter after it was pointed out to them, with a first suggestion to try reseating the CPU.

The link:
https://nitter.poast.org/RaptorCompSys/status/1652858741635665920#m
Title: Re: OpenBMC reporting critical error Proc.FSI.Error.MasterDetectionFailure
Post by: r34per on May 02, 2023, 09:03:11 pm
I'll give that a try, thanks for the heads up!
Title: Re: OpenBMC reporting critical error Proc.FSI.Error.MasterDetectionFailure
Post by: cy384 on May 04, 2023, 07:15:15 pm
I got this error in my Blackbird's BMC log when there was a brief (like one second) power outage (brownout maybe?) last week.
Title: Re: OpenBMC reporting critical error Proc.FSI.Error.MasterDetectionFailure
Post by: MPC7500 on May 05, 2023, 01:28:52 pm
Then I would try this:
https://wiki.raptorcs.com/wiki/Troubleshooting/Guard_Partition

Otherwise, if that doesn't help, I would re-flash the BMC and OpenPOWER firmware:
https://wiki.raptorcs.com/wiki/Updating_Firmware
Title: Re: OpenBMC reporting critical error Proc.FSI.Error.MasterDetectionFailure
Post by: r34per on May 06, 2023, 09:37:16 am
I got this error in my Blackbird's BMC log when there was a brief (like one second) power outage (brownout maybe?) last week.

That could be what's happening to mine too when I think about. Brief brown-outs aren't uncommon when it's windy or stormy, which it has been when it happened. I should probably invest in a UPS for it and see if it still gives me any trouble. I'll try MPC7500's suggestions too, thanks for the help!
Title: Re: OpenBMC reporting critical error Proc.FSI.Error.MasterDetectionFailure
Post by: Borley on January 03, 2024, 05:58:34 pm
These events have also been showing in the 'Server Health' section of my BMC web panel. Just four marked from 2020, and two more recent from December 2023.

Code: [Select]
org.Open_power.Proc.FSI.Error.MasterDetectionFailure
CALLOUT_DEVICE_PATH=/sys/devices/platform/gpio-fsi/fsi0/slave@00:00/raw CALLOUT_ERRNO=0_PID=4612

My system has been fine, other than occasionally booting without properly setting RTC time to the host (same issue that ClassicHasClass has been seeing? (https://www.talospace.com/2023/12/fedora-39-mini-review-on-blackbird-and.html))

If this is power outage related, that would make sense since, through 2020, I unknowingly had my Blackbird on a bad uninterruptible power supply. Then later had it in a location prone to outages before replacing the UPS.

If these logs are nothing critical, they should be safe to clear from the log?
Title: Re: OpenBMC reporting critical error Proc.FSI.Error.MasterDetectionFailure
Post by: Hasturtium on January 03, 2024, 08:04:20 pm
If you believe you've determined the cause, I don't see a reason to keep the error messages around for further investigation. Good work troubleshooting it.
Title: Re: OpenBMC reporting critical error Proc.FSI.Error.MasterDetectionFailure
Post by: ClassicHasClass on January 04, 2024, 12:59:54 pm
My system has been fine, other than occasionally booting without properly setting RTC time to the host (same issue that ClassicHasClass has been seeing? (https://www.talospace.com/2023/12/fedora-39-mini-review-on-blackbird-and.html))

Yeah, I'm trying to do more research on that. I'm assuming the BMC settings just got whacked given that the password was also scrambled but it still seems an odd failure mode.