Raptor Computing Systems Hardware > Talos II
SMART repeatedly reporting temperature errors for NVMe SSD
jcsiblesei:
I just got a new Talos II TL2SV4 server a few days ago, with the 500GB Samsung internal NVMe storage option. Shortly after installing the OS, smartd started reporting "Device: /dev/nvme0, Critical Warning (0x02): Temperature" about once per day.
Looking at "smartctl -a /dev/nvme0", with the system totally idle, I see temperatures like this:
--- Code: ---Temperature Sensor 1: 45 Celsius
Temperature Sensor 2: 52 Celsius
--- End code ---
And after running "dd if=/dev/nvme0n1 of=/dev/null bs=1M" to generate load, after only a few seconds, I see temperatures like this:
--- Code: ---Temperature Sensor 1: 46 Celsius
Temperature Sensor 2: 80 Celsius
--- End code ---
These temperatures seem way too high. The BMC was also reporting a warning that "Temperature Pcie" was 49 C. I looked inside the case, and it looks like there's basically no way for any air to flow over the NVMe drive. It's all the way at the bottom against the edge of the case, with no ventilation around it.
Has anyone ran into this before? What should I do about it?
MPC7500:
Which Samsung NVMe are you using?
In any case, I would install an NVMe that doesn't get that hot.
atomicdog:
Is there a fan shroud you can adjust to change the airflow?
The 2U supermicro chassis w/HDD Raptor used has an adjustable fan shroud.
jcsiblesei:
--- Quote from: MPC7500 on December 02, 2024, 12:09:16 pm ---Which Samsung NVMe are you using?
--- End quote ---
It's a Samsung SSD 980 500GB.
--- Quote from: MPC7500 on December 02, 2024, 12:09:16 pm ---In any case, I would install an NVMe that doesn't get that hot.
--- End quote ---
But this is the one that it shipped with.
--- Quote from: atomicdog on December 02, 2024, 01:16:08 pm ---Is there a fan shroud you can adjust to change the airflow?
The 2U supermicro chassis w/HDD Raptor used has an adjustable fan shroud.
--- End quote ---
There is a fan shroud, but even if it weren't there at all, there wouldn't be a path for airflow. It's only being blocked by the NVMe carrier card itself and the sides of the case. (I have photos to show what I mean, but when I try to attach them, I get the message "The upload folder is full. Please try a smaller file and/or contact an administrator.")
MPC7500:
You ordered the Talos II directly from RaptorCS?
In fact, the 980 should remain below 80°C.
Navigation
[0] Message Index
[#] Next page
Go to full version