Raptor Computing Systems Hardware > Blackbird

Did my Blackbird just die on me?

(1/3) > >>

kth5:
The other day I was logged in from remote and the box just goes down. I could still reach the BMC and attempt to power it up but to no avail. No Hostboot output on serial (via BMC) or event logs on the BMC. Just plain nothing.

Once I got home I switched the box on manually via switch, the fans started running on full tilt as usual but after pretty much exactly 30s it switched off again, without leaving a trace as to why in the eventlog on the BMC.

Then, I went to remove all hardware but the CPU one by one with tries in between, same effect.

The only thing that looks weird obviously are repeating dmesg entries every few seconds on the BMC:


--- Code: ---[ 1367.988668] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 26 (F20) for 1e780000.gpio:306
[ 1367.988711] Want SCU90[0x00000002]=0x1, got 0x0 from 0x063F0000
[ 1367.988731] Want SCU8C[0x00000200]=0x1, got 0x0 from 0x00000001
[ 1367.988746] Want SCU70[0x00200000]=0x1, got 0x0 from 0xF1105206
[ 1370.989477] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 26 (F20) for 1e780000.gpio:306
[ 1370.989520] Want SCU90[0x00000002]=0x1, got 0x0 from 0x063F0000
[ 1370.989538] Want SCU8C[0x00000200]=0x1, got 0x0 from 0x00000001
[ 1370.989548] Want SCU70[0x00200000]=0x1, got 0x0 from 0xF1105206
[ 1373.990267] aspeed-g5-pinctrl 1e6e2000.syscon:pinctrl: request pin 26 (F20) for 1e780000.gpio:306
[ 1373.990311] Want SCU90[0x00000002]=0x1, got 0x0 from 0x063F0000
[ 1373.990330] Want SCU8C[0x00000200]=0x1, got 0x0 from 0x00000001
[ 1373.990342] Want SCU70[0x00200000]=0x1, got 0x0 from 0xF1105206
--- End code ---

Do these mean anything or are we just talking verbosity?

I can upgrade PNOR etc from BMC without failure and read it back, so that's not it either.


Did my CPU just die and if so, how the hell can I confirm this before I set on another investment of hundreds of dollars? :(

atomicdog:
Looks like the BMC is failing to setup an IO pin.
Just a guess but maybe there's short at a button, connector, or the pins on BMC IC.

MPC7500:
In September I had a similar error message.
It happened after a thunderstorm and lightning struck (far away). There was a surge as a result.
During the same time, our heating control system also needed to be replaced, because of this.

Long story short: I had to reflash the BMC and OpenPOWER firmware.
https://wiki.raptorcs.com/wiki/Updating_Firmware

Now I always turn off the power strip when a storm is coming. BTW, Awilfox had the same issue, longe time ago.

Borley:
"Pin 26" is only referenced on the PCIe port and on J10117, which I think is the FlexVer port? I'm not sure how exacting that error message might be.


--- Quote from: MPC7500 on July 14, 2022, 05:08:54 pm ---In September I had a similar error message.
It happened after a thunderstorm and lightning struck (far away). There was a surge as a result.
During the same time, our heating control system also needed to be replaced, because of this.

Long story short: I had to reflash the BMC and OpenPOWER firmware.
https://wiki.raptorcs.com/wiki/Updating_Firmware

Now I always turn off the power strip when a storm is coming. BTW, Awilfox had the same issue, longe time ago.

--- End quote ---

It might also be worth putting behind a surge suppressor. Normally I wouldn't care so much but seeing as these parts cost what they do...

kth5:

--- Quote from: MPC7500 on July 14, 2022, 05:08:54 pm ---Long story short: I had to reflash the BMC and OpenPOWER firmware.
https://wiki.raptorcs.com/wiki/Updating_Firmware

--- End quote ---

That was the only thing I have not tried. So once at my desk at work I did it remotely only to find out that the BMC did not recover within 30 minutes after the reboot. Switched it off after approx 35 via the power strip (remotely accessible) and back on, to no avail.

Seems I may have bricked it fully now. :(

Once I get home it's time to hook up the serial again and see if there's any live visable still.

Navigation

[0] Message Index

[#] Next page

Go to full version