Raptor Computing Systems Community Forums (BETA)
Raptor Computing Systems Hardware => Talos II => Topic started by: jcsiblesei on December 02, 2024, 10:07:21 am
-
I just got a new Talos II TL2SV4 server a few days ago, with the 500GB Samsung internal NVMe storage option. Shortly after installing the OS, smartd started reporting "Device: /dev/nvme0, Critical Warning (0x02): Temperature" about once per day.
Looking at "smartctl -a /dev/nvme0", with the system totally idle, I see temperatures like this:
Temperature Sensor 1: 45 Celsius
Temperature Sensor 2: 52 Celsius
And after running "dd if=/dev/nvme0n1 of=/dev/null bs=1M" to generate load, after only a few seconds, I see temperatures like this:
Temperature Sensor 1: 46 Celsius
Temperature Sensor 2: 80 Celsius
These temperatures seem way too high. The BMC was also reporting a warning that "Temperature Pcie" was 49 C. I looked inside the case, and it looks like there's basically no way for any air to flow over the NVMe drive. It's all the way at the bottom against the edge of the case, with no ventilation around it.
Has anyone ran into this before? What should I do about it?
-
Which Samsung NVMe are you using?
In any case, I would install an NVMe that doesn't get that hot.
-
Is there a fan shroud you can adjust to change the airflow?
The 2U supermicro chassis w/HDD Raptor used has an adjustable fan shroud.
-
Which Samsung NVMe are you using?
It's a Samsung SSD 980 500GB.
In any case, I would install an NVMe that doesn't get that hot.
But this is the one that it shipped with.
Is there a fan shroud you can adjust to change the airflow?
The 2U supermicro chassis w/HDD Raptor used has an adjustable fan shroud.
There is a fan shroud, but even if it weren't there at all, there wouldn't be a path for airflow. It's only being blocked by the NVMe carrier card itself and the sides of the case. (I have photos to show what I mean, but when I try to attach them, I get the message "The upload folder is full. Please try a smaller file and/or contact an administrator.")
-
You ordered the Talos II directly from RaptorCS?
In fact, the 980 should remain below 80°C.
-
You ordered the Talos II directly from RaptorCS?
Yes. I ordered it directly from RaptorCS, and I didn't make any hardware changes to it.
-
Updates:
I upgraded the SSD's firmware from 1B4QFXO7 to 3B4QFXO7. This didn't help at all.
I rearranged the PCIe cards in the host so that the NVMe carrier is in slot 1 (it was in slot 5 before), and slot 2 is empty (with a bracket with ventilation holes). Now there is a path for the fan to blow air right over the SSD. This did help: it now takes several minutes of constant reading before the temperature approaches 80C (it was less than 30 seconds before I made this change).
For comparison, I temporarily moved the SSD to another computer with better airflow, and no matter what I did, the temperature never went above 66C there. So even though it's overheating way less now, it's still concerning that it's overheating at all.
-
It is possible to find heat spreaders that adhere to the drive. I know I've gotten a few nvme drives that include them, but never had to use them. Otherwise, this sounds like something a strategically placed 40/60/80mm fan could alleviate.
-
I bought a SilverStone SST-TP02-M2 M.2 SSD Cooling Kit (https://www.newegg.com/silverstone-sst-tp02-m2-cooling-kit/p/N82E16835220138), and it helped significantly. Now it only gets to 60C after about 10 minutes of constant hammering. I consider my problem solved now.
One thing I'm curious about, though: has anyone else had this problem with the 2U servers as shipped by RaptorCS? I haven't seen anyone else have this issue, but I also can't imagine what would have made my server different than everyone else's.
-
The 2Us should be nearly identical to the towers except for the orientation, and I haven't observed that in my T2 tower, though the side panel is usually slightly open for me to check on the interior. I don't know what the case fans are like the servers, though; this came with the standard Supermicro SC747 case fans. I even ended up disabling one to reduce the noise but the cooling is still fine.
-
What is the "heavy duty case" that comes with the TL2SV4?
-
There's a distinct lack of markings, but I think it's a Supermicro SC216. (The backplane does have markings; it's a BPN-SAS3-216A-N4 for sure.)