The other day I was logged in from remote and the box just goes down. I could still reach the BMC and attempt to power it up but to no avail. No Hostboot output on serial (via BMC) or event logs on the BMC. Just plain nothing.
Once I got home I switched the box on manually via switch, the fans started running on full tilt as usual but after pretty much exactly 30s it switched off again, without leaving a trace as to why in the eventlog on the BMC.
Then, I went to remove all hardware but the CPU one by one with tries in between, same effect.
The only thing that looks weird obviously are repeating dmesg entries every few seconds on the BMC:
[ 1367.988668] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 26 (F20) for 1e780000.gpio:306
[ 1367.988711] Want SCU90[0x00000002]=0x1, got 0x0 from 0x063F0000
[ 1367.988731] Want SCU8C[0x00000200]=0x1, got 0x0 from 0x00000001
[ 1367.988746] Want SCU70[0x00200000]=0x1, got 0x0 from 0xF1105206
[ 1370.989477] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 26 (F20) for 1e780000.gpio:306
[ 1370.989520] Want SCU90[0x00000002]=0x1, got 0x0 from 0x063F0000
[ 1370.989538] Want SCU8C[0x00000200]=0x1, got 0x0 from 0x00000001
[ 1370.989548] Want SCU70[0x00200000]=0x1, got 0x0 from 0xF1105206
[ 1373.990267] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 26 (F20) for 1e780000.gpio:306
[ 1373.990311] Want SCU90[0x00000002]=0x1, got 0x0 from 0x063F0000
[ 1373.990330] Want SCU8C[0x00000200]=0x1, got 0x0 from 0x00000001
[ 1373.990342] Want SCU70[0x00200000]=0x1, got 0x0 from 0xF1105206
Do these mean anything or are we just talking verbosity?
I can upgrade PNOR etc from BMC without failure and read it back, so that's not it either.
Did my CPU just die and if so, how the hell can I confirm this before I set on another investment of hundreds of dollars?