Software > Firmware

Updating Talos II firmware to IBM PNOR V2.18?

(1/2) > >>

JeremyRand:
According to this IBM documentation (thanks awilfox for the pointer), PNOR v2.18 contains the following change:


--- Quote ---HIPER/Pervasive:   A problem was fixed for a processor core checkstop with SRC BC70E540 logged with Signature Description " ex(n0p1c4) (NCUFIR[11]) NCU no response to snooped TLBIE".  This problem is intermittent and random but occurs with relatively high frequency for certain workloads.  The trigger for the failure is one core of a fused core pair going into a stopped state while the other core of the pair continues running.
--- End quote ---

Is it possible to get the Talos II firmware bumped to upstream IBM v2.18? I believe I may be running into this bug.

tle:
AFAIK no, Talos firmware is not synced with upstream however it does not stop people from using a modified version of upstream kernel. Perhaps you should have a read at https://www.flamingspork.com/blog/2020/05/25/op-build-v2-5-firmware-for-the-raptor-blackbird/ and see if you could adapt anything from his work

tle:
Having said that, I am unsure if using custom firmware would void warranty or not. That I have to leave the answer to official response from RaptorCS

ejfluhr:
I believe that all OpenPOWER/OPAL-based systems use SMT4 cores, not SMT8 (i.e. "fused") cores.    So that may mean you aren't seeing the same problem.

Do you get the same fault callout?

JeremyRand:

--- Quote from: ejfluhr on March 14, 2023, 06:05:09 pm ---I believe that all OpenPOWER/OPAL-based systems use SMT4 cores, not SMT8 (i.e. "fused") cores.    So that may mean you aren't seeing the same problem.

Do you get the same fault callout?

--- End quote ---

I get the same error "NCU no response to snooped TLBIE". I asked Timothy on IRC and he said that the 18/22-core Raptor CPU's use fused cores. Which tracks with the fact that this started happening to me when I upgraded from 4-core CPU's to 22-core CPU's.

Looks like Raptor only has minimal changes to the IBM repo that contains the fix, and there are no rebase conflicts. So, I'll try building a modified firmware that incorporates IBM's fix and see if it helps here.

Navigation

[0] Message Index

[#] Next page

Go to full version