Software > Firmware
Updating Talos II firmware to IBM PNOR V2.18?
JeremyRand:
According to this IBM documentation (thanks awilfox for the pointer), PNOR v2.18 contains the following change:
--- Quote ---HIPER/Pervasive: A problem was fixed for a processor core checkstop with SRC BC70E540 logged with Signature Description " ex(n0p1c4) (NCUFIR[11]) NCU no response to snooped TLBIE". This problem is intermittent and random but occurs with relatively high frequency for certain workloads. The trigger for the failure is one core of a fused core pair going into a stopped state while the other core of the pair continues running.
--- End quote ---
Is it possible to get the Talos II firmware bumped to upstream IBM v2.18? I believe I may be running into this bug.
tle:
AFAIK no, Talos firmware is not synced with upstream however it does not stop people from using a modified version of upstream kernel. Perhaps you should have a read at https://www.flamingspork.com/blog/2020/05/25/op-build-v2-5-firmware-for-the-raptor-blackbird/ and see if you could adapt anything from his work
tle:
Having said that, I am unsure if using custom firmware would void warranty or not. That I have to leave the answer to official response from RaptorCS
ejfluhr:
I believe that all OpenPOWER/OPAL-based systems use SMT4 cores, not SMT8 (i.e. "fused") cores. So that may mean you aren't seeing the same problem.
Do you get the same fault callout?
JeremyRand:
--- Quote from: ejfluhr on March 14, 2023, 06:05:09 pm ---I believe that all OpenPOWER/OPAL-based systems use SMT4 cores, not SMT8 (i.e. "fused") cores. So that may mean you aren't seeing the same problem.
Do you get the same fault callout?
--- End quote ---
I get the same error "NCU no response to snooped TLBIE". I asked Timothy on IRC and he said that the 18/22-core Raptor CPU's use fused cores. Which tracks with the fact that this started happening to me when I upgraded from 4-core CPU's to 22-core CPU's.
Looks like Raptor only has minimal changes to the IBM repo that contains the fix, and there are no rebase conflicts. So, I'll try building a modified firmware that incorporates IBM's fix and see if it helps here.
Navigation
[0] Message Index
[#] Next page
Go to full version