Raptor Computing Systems Community Forums (BETA)
Raptor Computing Systems Hardware => Mod Zone => Topic started by: Woof on June 24, 2023, 07:29:37 am
-
I thought I'd share some work-in-progress on this. It's not yet finished, and I'm waiting on the next iteration to be CNC'd.
After a lot of prompting from @Vikings, I designed a bracket that allows other coolers to be fitted (Noctua SecuFirm2 and AMD AM4/5). I'm testing here with an NH-U12A, and the temps are better than the stock cooler (and it's quieter!). This version shown is flawed, and I don't know if the changes will fix it: the Noctua 78mm mount is too flexible, so the cooler can move, and since there's no independent loading mechanism on the T2 this movement transfers to the CPU (which then crashes the system).
In a few weeks when I get the next version back I'll retry with the Noctua cooler, and I'll also compare with an EK AM5 water block.
(I'll share the design files when it's done, whether it works or not.)
-
Woof has done an amazing job!
The 3U IBM HSF weighs 637g versus 1220g of the NH-U12A.
I can totally see this with a sturdier custom mount that replaces the Noctua's which in turn will allow for a more even and higher pressure.
-
Thanks!
The version tested here is quite compromised. I'd been trying to get it thin enough for AM5 mounts that the Noctua mount got neglected, then I made things worse by needing to manually drill out a counterbore, so there's about 2mm of thread here (which is AFAIK where a lot of the instability comes from). I threadlocked the screws (otherwise the Noctua mount unscrews them as it tightens) but it gave on the last turn on one side, so it's not tightened all the way. Aaaand, I'm using Carbonaut (https://www.thermal-grizzly.com/produkte/298-carbonaut) instead of thermal paste, which is extremely slippy (which I think adds to the movement).
I've attached an STL of the latest version, which I'm hoping is even 3D printable with the right plastic (though maybe only viable with the AM5 mount; I 3D printed the test versions for the dimensions and fit, but never powered up with one). As you can see it's much thicker, which meant a compromise for the water block (more of when I get to test it!).
-
That's excellent work.
-
Thanks! I'll be back on it in a few weeks with more results.
-
Woo-hoo - the changes to strengthen the Noctua mount resulted in a stable system! I can wobble the cooler now and I have no crashes, so next will be long term testing (I have two earlier spares without amends for the AM5 mounts which I'll be sending to @vikings.thum). Some photos:
https://wip.numfum.com/cw/2023-07-13/noctua-1.jpeg
https://wip.numfum.com/cw/2023-07-13/noctua-2.jpeg
https://wip.numfum.com/cw/2023-07-13/noctua-3.jpeg
https://wip.numfum.com/cw/2023-07-13/secufirm2.jpeg
Tomorrow I'll test with a water block.
-
Today I tested the mount with an EK AM5 water block. This needed a compromise in the design (which I could work around with custom threaded fittings, but each cooler/water block would need its own design, whereas what I have here works with any AM5 cooler) adding a 4mm copper heat spreader on top of the CPU:
https://wip.numfum.com/cw/2023-07-14/am5-copper-hs.jpeg
It's a smooth machined lump of copper with a pocket to grip the top of the CPU. I used a Thermal Grizzly Kryosheet pad instead of thermal paste (which was frustrating to cut to size), which I used to aid the potential repeatedly dismantling. I mounted the water block on that:
https://wip.numfum.com/cw/2023-07-14/am5-ekwb-1.jpeg
https://wip.numfum.com/cw/2023-07-14/am5-ekwb-2.jpeg
The whole thing is designed so the screws go all the way in (hand tight) to get the maximum out of the springs. Tightening was done in a cross pattern, a few turns at a time. Then the same Alphacool radiator hooked up I use for tests:
https://wip.numfum.com/cw/2023-07-14/am5-with-rad.jpeg
The mount is solid, as it is with the Noctua fan, and can be pushed and prodded with no play. I ran the same tests as I did with the Noctua and earlier Alphacool water block, with these numbers:
https://wip.numfum.com/cw/2023-07-14/bc2etc-gen-waku-ek-am5.png
The temperatures are higher than the with the Alphacool:
https://wip.numfum.com/cw/2023-07-14/bc2etc-gen-waku-acool.png
My solution runs nearly 10 degrees hotter, which is probably a combination of thermal pads vs paste (I used Kryonaut Extreme as the paste, at around $95 USD per pot over here, which we use at work for GPU water cooling) plus the compromise of that lump of copper. But a maximum of 57 degrees is cool running!
I've decided against custom fittings and will keep the compromise of copper heat spreader, but I will switch over to thermal paste and re-run the tests next week.
It's quiet, with the fans stalling and shutting off, and with temps better than the Noctua fan:
https://wip.numfum.com/cw/2023-07-14/bc2etc-gen-noctua.png
And certainly better than the IBM HSF:
https://wip.numfum.com/cw/2023-07-14/bc2etc-gen-hsf.png
More tests next week!
-
Well I did what I said I wouldn't do and went all-in with custom mounts for the EK cooler to eliminate the copper heat spreader/shim:
https://wip.numfum.com/cw/2023-08-12/custom-mounts.jpeg
These attach to the earlier bracket and allow the AM5 water block to mount directly on the POWER9, all properly loaded and torqued:
https://wip.numfum.com/cw/2023-08-12/ek-am5-mounted.jpeg
The temps are amazing:
https://wip.numfum.com/cw/2023-08-12/bc2etc-gen-waku-ek-am5-custom.png
This is running flat-out and it's around 40 degrees. I'm using a graphite pad instead of paste, so it's not entirely comparable to the first tests I did with the Alphacool bracket from Vikings, and at a guess it would go even lower with the Thermal Grizzly paste (but I just couldn't bring myself to clean off any more paste!). I'll say I'm 100% happy with the result and will re-add the second CPU. The pad will never dry out and should be good for life.
I'm going to document and write this all up in the next weeks, plus make the CAD files available. What I will say, though, is if you want a quick and easy way to water cool, the option from Vikings is probably for you. My solution gets a few more degrees for a much greater cost (especially since this was all one-off custom machining, although with a group buy the price would come way down).
I do think the Noctua fan is a good solution for most people wanting a quieter T2 or Blackbird. Vikings Thum will hopefully publish his findings with a 3D printable version of the same bracket.
-
Any added noise or fully silent?
-
With such good contact with the CPU, for regular use the radiator fans aren't even running, so it's just the sound of the pump.
-
Do you have any idea how much force you are applying down through the module to the socket pins? IBM has a target pressure to ensure even and reliable contacting across all pins for the life of the processor given certain assumptions about electrical loads (e.g. amps thru pins) and thermal cycling. It would be interesting to know if your solutions are approximately the same, or much higher or lower.
-
I've no idea of the amount of force. For the design I went with the same mounted hight (via trial and error and shims) as the Noctua cooler and the EK water block on an AM5 mobo.
What I found was, for the Noctua, until I got this height and the stiffness right then the system wasn't stable. Once I got this correct (measured at the spring hight, how far they were tensioned, how the cooler sits on the CPU including getting its eccentricity right) the system was a stable and I could wobble and shake the cooler. After the Noctua's SecuFirm2 mounts I did the same for the EK Pro water block. AM5 is a 1781 pin LGA, the POWER9 Sforza is a 2601 pin LGA, and have 40x40 vs 48x48 IHS sizes (but the coolers usually have 50x50 cold plates), so they're not like for like.
The socket I used for mounting and testing the various coolers had maybe 20 mount and test cycles, trying to keep the number down (since I'd read the socket was good for very few cycles). The second CPU socket, used for all the measuring and verification, seems to have been trashed in the process (I must have eventually squashed the pins, I'll need to get my SMT stereo microscope unpacked to look). The lack of independent loading mechanism (like on the AM5) appears to have been the downfall.
For stability testing I ran this machine hard for multi-day cycles (one of my own tools that 100% loads the CPU cores with lots of maths).
Since I trashed my second CPU socket I've not been able to test both CPUs yet, I need to solve that next.
-
I found the problem with my second CPU, I appear to have dropped something onto the socket and bent a bunch of pins:
https://wip.numfum.com/cw/2023-08-21/cpu2-cu.jpeg
I have a replacement mobo coming from Raptor so will get back on with this in a few weeks.
-
Ugh!
-
Do you have any idea how much force you are applying down through the module to the socket pins?
I'm going to build a jig to measure this, for both the Noctua and EK options, then I can tailor the springs to make sure the force is similar to the IBM part. It'll keep me busy waiting for my replacement mobo to be delivered.
-
>For stability testing I ran this machine hard for multi-day cycles (one of my own tools that 100% loads the CPU cores with lots of maths).
Do you know how "hot" that workload is compared to the TDP rating of the processor?
Regards, Eric
-
I don't, but when comparing the exact same test with the IBM 3U HSF the temperatures were around 70 degrees.
I have a replacement mobo now (RCS were really helpful here) and I'm waiting on some CNC'd parts back with amends to make installing this easier. The design files are here for anyone interested:
https://github.com/cwoffenden/talosmods/tree/main/am5-sf2-cooler
(I'll be updating this and adding docs along the way)
-
Seems like you have an 18c POWER90 rated at 190W? https://raptorcs.com/content/CP9M36/intro.html
POWER9 TDP long-term max temperature rating is 85C. If your system runs ~70C, you have decent margin to the reliability limit. How much is the improved cooling and how much is your workload would probably need to come from power data vs. that 190W spec.
-
Yes, I have the dual 18 core, and with the water cooling temps are around 40 degrees (same as the simpler solution from Vikings) and the machine is running silently (a quiet machine was the original aim). The same tests with a Noctua air cooler the temps are mid-50 and the fans are quiet (compared with the IBM HSF, anything is quiet).
We water cool racks of machines at work, with the big benefit being keeping the temps down keeps the clocks high and the power consumption down. With Threadrippers, for example, they run at boost clocks all the time, vs down-clocking to half the speed to stay within TDP, effectively doubling the throughput. Same with GPUs (which is where we do most of the water cooling, CPUs was just a curiosity).
Real life took over for a while but I will work on a jig to test the mounting pressure.
-
>with the big benefit being keeping the temps down keeps the clocks high and the power consumption down.
If using IBM's WOF, the algorithm does not work that way. It should boost to "the same" frequency regardless of CPU temp until it exceeds the temp limit at which point it will lower frequency to protect temperatures. The TDP is conservative and very few workloads would exceed that. The bigger factor affecting the temp protection mechanism is ambient temp....> 30C is more likely to throttle than < 30C.
What workload are you testing with? I just got a Blackbird running Ubuntu and am stress testing it to see how it responds. It's quite fun.
-
Thanks for the info, I know little about how the IBM WOF works.
The workload I'm testing with is from a graphics tool I've been working on for a while, which is interesting because I'd tuned the threading for 64C/128T Threadrippers and then started using it to compare other systems (and it's a nice workload for many cores). Send me a PM if you're interested in building it from source, I'm not ready to announce it yet so it's under wraps (GCC/Clang and CMake on most systems build out of the box).
-
For the Threadripper, have you tried this?
https://github.com/tud-zih-energy/FIRESTARTER
POWER9 support is planned
-
I haven’t seen that but I definitely will try. A quick look at the source and I see it’s handling thread affinity, a requirement for Threadrippers on Windows.
-
That looks interesting but
>We therefore use highly optimized assembly routines that take the specific properties of a given processor microarchitecture into account.
Is that what is being done for
>POWER9 support is planned
??
I'm running a mersenne-prime calculator which seems to push the CPU pretty hard. At least, it runs down near the "base" frequency of 3.2GHz (I have a 4-core CPU).
>cat mersenne.c
#include <stdio.h>
#include <stdlib.h>
#include <gmp.h>
int main(int argc, char *argv[]) {
char *endptr;
unsigned long int p = strtoul(argv[1],&endptr,10);
mpz_t M, powerof2, one, two;
mpz_init(M); mpz_init(powerof2);
mpz_init_set_str(one,"1",10);
mpz_init_set_str(two,"2",10);
mpz_pow_ui(powerof2,two,p);
mpz_sub(M,powerof2,one);
gmp_printf("%Zd",M);
return 0;
}
Run with:
>cat mersenne16.ksh
num=82589933
thread=0
while (( thread < 16 ))
do
echo $thread
echo time ./mersenne $num > M48.$thread
time ./mersenne $num > M48.$thread &
(( thread += 1 ))
done