Software > Firmware

Updating Talos II firmware to IBM PNOR V2.18?

<< < (2/2)

ejfluhr:
The 18c & 22c parts are "paired" meaning 2 SMT4 cores share the same L2 & L3, unlike the 4c and 8c which are "unpaired" meaning each core gets the full L2 and L3 to itself.    This is not the same as "fused" (i.e. SMT8 cores) but it is quite likely that the fix will also work for "paired" cores as presumably the issue is sharing cacheable/non-cacheable pathways.   Good luck!

JeremyRand:

--- Quote from: ejfluhr on April 17, 2023, 05:44:57 pm ---The 18c & 22c parts are "paired" meaning 2 SMT4 cores share the same L2 & L3, unlike the 4c and 8c which are "unpaired" meaning each core gets the full L2 and L3 to itself.    This is not the same as "fused" (i.e. SMT8 cores) but it is quite likely that the fix will also work for "paired" cores as presumably the issue is sharing cacheable/non-cacheable pathways.   Good luck!

--- End quote ---

I think you're right about the terminology, but yeah, I doubt the IBM docs draw a distinction since those docs are exclusively written for SMT8 users.

I've successfully built a PNOR with IBM's patch; it installed without major issues (I accidentally hosed things the first time I installed it due to the BMC running out of RAM -- protip, don't put multiple PNOR images in OpenBMC /tmp/ -- but power-cycling the BMC fixed that). So far it seems stable, I'll be running it for the next month or so to see if any checkstops happen.

tle:

--- Quote from: JeremyRand on April 21, 2023, 10:15:14 pm ---
--- Quote from: ejfluhr on April 17, 2023, 05:44:57 pm ---The 18c & 22c parts are "paired" meaning 2 SMT4 cores share the same L2 & L3, unlike the 4c and 8c which are "unpaired" meaning each core gets the full L2 and L3 to itself.    This is not the same as "fused" (i.e. SMT8 cores) but it is quite likely that the fix will also work for "paired" cores as presumably the issue is sharing cacheable/non-cacheable pathways.   Good luck!

--- End quote ---

I think you're right about the terminology, but yeah, I doubt the IBM docs draw a distinction since those docs are exclusively written for SMT8 users.

I've successfully built a PNOR with IBM's patch; it installed without major issues (I accidentally hosed things the first time I installed it due to the BMC running out of RAM -- protip, don't put multiple PNOR images in OpenBMC /tmp/ -- but power-cycling the BMC fixed that). So far it seems stable, I'll be running it for the next month or so to see if any checkstops happen.

--- End quote ---

Would you be able to provide more details on which patch? Many thanks

JeremyRand:

--- Quote from: tle on May 28, 2023, 08:21:48 am ---
--- Quote from: JeremyRand on April 21, 2023, 10:15:14 pm ---
--- Quote from: ejfluhr on April 17, 2023, 05:44:57 pm ---The 18c & 22c parts are "paired" meaning 2 SMT4 cores share the same L2 & L3, unlike the 4c and 8c which are "unpaired" meaning each core gets the full L2 and L3 to itself.    This is not the same as "fused" (i.e. SMT8 cores) but it is quite likely that the fix will also work for "paired" cores as presumably the issue is sharing cacheable/non-cacheable pathways.   Good luck!

--- End quote ---

I think you're right about the terminology, but yeah, I doubt the IBM docs draw a distinction since those docs are exclusively written for SMT8 users.

I've successfully built a PNOR with IBM's patch; it installed without major issues (I accidentally hosed things the first time I installed it due to the BMC running out of RAM -- protip, don't put multiple PNOR images in OpenBMC /tmp/ -- but power-cycling the BMC fixed that). So far it seems stable, I'll be running it for the next month or so to see if any checkstops happen.

--- End quote ---

Would you be able to provide more details on which patch? Many thanks

--- End quote ---

This is the hcode branch I used: https://github.com/JeremyRand/hcode/tree/talos-2019-07-25-master-rebased

As you can see, it's simply a copy of Raptor's hcode, rebased against current upstream IBM hcode (there were no rebase conflicts). The specific bugfix commit is this one: https://github.com/JeremyRand/hcode/commit/ca06a0c996e3b48c02cfb3912dddf7ca23ec4202

I've been running it on my DD2.3 2x18-core Talos II for 2 months, and my DD2.2 2x22-core Talos II for 1 month, and I have had no checkstops on either since applying the patch, nor any new issues. At this point I can recommend that Raptor integrate the bugfix into a new PNOR release. Since the rebase had no conflicts, it should be trivially easy for Raptor to do this.

tle:

--- Quote from: JeremyRand on June 16, 2023, 02:32:16 am ---
--- Quote from: tle on May 28, 2023, 08:21:48 am ---
--- Quote from: JeremyRand on April 21, 2023, 10:15:14 pm ---
--- Quote from: ejfluhr on April 17, 2023, 05:44:57 pm ---The 18c & 22c parts are "paired" meaning 2 SMT4 cores share the same L2 & L3, unlike the 4c and 8c which are "unpaired" meaning each core gets the full L2 and L3 to itself.    This is not the same as "fused" (i.e. SMT8 cores) but it is quite likely that the fix will also work for "paired" cores as presumably the issue is sharing cacheable/non-cacheable pathways.   Good luck!

--- End quote ---

I think you're right about the terminology, but yeah, I doubt the IBM docs draw a distinction since those docs are exclusively written for SMT8 users.

I've successfully built a PNOR with IBM's patch; it installed without major issues (I accidentally hosed things the first time I installed it due to the BMC running out of RAM -- protip, don't put multiple PNOR images in OpenBMC /tmp/ -- but power-cycling the BMC fixed that). So far it seems stable, I'll be running it for the next month or so to see if any checkstops happen.

--- End quote ---

Would you be able to provide more details on which patch? Many thanks

--- End quote ---

This is the hcode branch I used: https://github.com/JeremyRand/hcode/tree/talos-2019-07-25-master-rebased

As you can see, it's simply a copy of Raptor's hcode, rebased against current upstream IBM hcode (there were no rebase conflicts). The specific bugfix commit is this one: https://github.com/JeremyRand/hcode/commit/ca06a0c996e3b48c02cfb3912dddf7ca23ec4202

I've been running it on my DD2.3 2x18-core Talos II for 2 months, and my DD2.2 2x22-core Talos II for 1 month, and I have had no checkstops on either since applying the patch, nor any new issues. At this point I can recommend that Raptor integrate the bugfix into a new PNOR release. Since the rebase had no conflicts, it should be trivially easy for Raptor to do this.

--- End quote ---

Thanks for the information

Navigation

[0] Message Index

[*] Previous page

Go to full version