Raptor Computing Systems Community Forums (BETA)

Raptor Computing Systems Hardware => Talos II => Topic started by: DKnoto on December 01, 2023, 03:23:34 pm

Title: Talos II reboots itself
Post by: DKnoto on December 01, 2023, 03:23:34 pm
Since September I have been having problems with spontaneous reboots of my machine, it has happened four times. Recently it was very annoying, the machine rebooted over and over again without loading the operating system. Only removing the power plug helped.

I managed to take a picture of what the report looks like after such a fall:

https://www.dropbox.com/scl/fi/hk0zqvfjpidexnw7m1xln/Talos-II-2023-11-27-Crash.jpeg?rlkey=0jj34pjpv708zigt08kzfk3zl&dl=0

Any ideas what could cause this behavior?
Title: Re: Talos II reboots itself
Post by: Hasturtium on December 02, 2023, 06:09:33 pm
Looks like it might be tied to this (https://github.com/open-power/hostboot/issues/220), though I couldn’t say what’s triggering it… In any case it looks like updating the firmware to 2.18+ could mitigate the issue. I'm running a Blackbird, so I can't really provide further insight on this one. But I hope it helps.
Title: Re: Talos II reboots itself
Post by: ClassicHasClass on December 03, 2023, 10:33:41 pm
Not sure, but definitely worth having a connection open to the BMC and watching any activity. How far does it get through Hostboot?

Also, did anything guard out?
Title: Re: Talos II reboots itself
Post by: MPC7500 on December 07, 2023, 03:50:44 pm
I had the same problem some time ago. After a thunderstorm with a lightning strike. I had to re-flash the firmware.
It could also indicate a bad PSU, which happens surprisingly often.
Title: Re: Talos II reboots itself
Post by: DKnoto on December 08, 2023, 01:01:22 am
I have a power supply with a 10-year warranty ;) Thermaltake Toughpower TF1 1550W.
Title: Re: Talos II reboots itself
Post by: Hasturtium on December 08, 2023, 06:05:15 pm
I have a power supply with a 10-year warranty ;) Thermaltake Toughpower TF1 1550W.

Then that likely isn't it, though don't dismiss it outright. Have you considered re-flashing the firmware as MPC7500 suggested?
Title: Re: Talos II reboots itself
Post by: DKnoto on December 10, 2023, 02:02:21 am
Have you considered re-flashing the firmware as MPC7500 suggested?

Yes, I'm considering it but I'm a little afraid to mess something up. I've never done it and my Talos II is my critical resource at the moment.
Title: Re: Talos II reboots itself
Post by: MPC7500 on December 10, 2023, 06:07:32 am
It's pretty simple. You only have to update the BMC and the OpenPOWER firmware, not the FPGA.
https://wiki.raptorcs.com/wiki/Updating_Firmware
Title: Re: Talos II reboots itself
Post by: DKnoto on December 10, 2023, 02:08:19 pm
I think I need to hurry up, a while ago I had another crash. I wasn't doing anything in particular I was listening to music on YT and reading Twitter in Firefox. The worst part is that my FreeBSD in Qemu VM has destroyed again...  :(
Title: Re: Talos II reboots itself
Post by: ejfluhr on December 13, 2023, 06:11:40 pm
Is the error always on c4?   If yes, can you disable c4?   
Title: Re: Talos II reboots itself
Post by: DKnoto on January 11, 2024, 04:45:13 am
On the kernel 6.6.8-200.fc39.ppc64le I have recently had similar instances of reboots but they occurred when connecting a device to USB, printer, iPad. On kernel 6.6.9-200.fc39.ppc64le it has been correct for two days.
Title: Re: Talos II reboots itself
Post by: DKnoto on January 15, 2024, 01:26:10 am
Over the past four days on kernel 6.6.9-200, I have made more than a dozen attempts to connect various devices to USB ports and the situation has not repeated.