Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - jcsiblesei

Pages: [1]
1
Talos II / Re: SMART repeatedly reporting temperature errors for NVMe SSD
« on: January 21, 2025, 05:31:56 pm »
There's a distinct lack of markings, but I think it's a Supermicro SC216. (The backplane does have markings; it's a BPN-SAS3-216A-N4 for sure.)

2
Talos II / Re: SMART repeatedly reporting temperature errors for NVMe SSD
« on: January 14, 2025, 11:25:15 am »
I bought a SilverStone SST-TP02-M2 M.2 SSD Cooling Kit, and it helped significantly. Now it only gets to 60C after about 10 minutes of constant hammering. I consider my problem solved now.

One thing I'm curious about, though: has anyone else had this problem with the 2U servers as shipped by RaptorCS? I haven't seen anyone else have this issue, but I also can't imagine what would have made my server different than everyone else's.

3
Talos II / Re: SMART repeatedly reporting temperature errors for NVMe SSD
« on: December 13, 2024, 01:29:25 pm »
Updates:

I upgraded the SSD's firmware from 1B4QFXO7 to 3B4QFXO7. This didn't help at all.

I rearranged the PCIe cards in the host so that the NVMe carrier is in slot 1 (it was in slot 5 before), and slot 2 is empty (with a bracket with ventilation holes). Now there is a path for the fan to blow air right over the SSD. This did help: it now takes several minutes of constant reading before the temperature approaches 80C (it was less than 30 seconds before I made this change).

For comparison, I temporarily moved the SSD to another computer with better airflow, and no matter what I did, the temperature never went above 66C there. So even though it's overheating way less now, it's still concerning that it's overheating at all.

4
Operating Systems and Porting / Re: [NEWS] Fedora 41 is out!
« on: December 08, 2024, 01:12:01 pm »
The scp command needs the -o option.
Someone recently on this forum had that problem, it has something to do with new default settings in scp.
Note: Uppercase -O, not lowercase -o. And I think it was this post:
the scp transfer worked after I added the option -O, I found that on a forum: "it an issue with OpenSSH client. Since OpenSSH 9.0, the client uses SFTP protocol by default. To use the legacy protocol, the -O option must be specified."

5
Talos II / Re: SMART repeatedly reporting temperature errors for NVMe SSD
« on: December 02, 2024, 03:54:08 pm »
You ordered the Talos II directly from RaptorCS?
Yes. I ordered it directly from RaptorCS, and I didn't make any hardware changes to it.

6
Talos II / Re: SMART repeatedly reporting temperature errors for NVMe SSD
« on: December 02, 2024, 01:49:53 pm »
Which Samsung NVMe are you using?
It's a Samsung SSD 980 500GB.
In any case, I would install an NVMe that doesn't get that hot.
But this is the one that it shipped with.
Is there a fan shroud you can adjust to change the airflow?
The 2U supermicro chassis w/HDD Raptor used has an adjustable fan shroud.
There is a fan shroud, but even if it weren't there at all, there wouldn't be a path for airflow. It's only being blocked by the NVMe carrier card itself and the sides of the case. (I have photos to show what I mean, but when I try to attach them, I get the message "The upload folder is full. Please try a smaller file and/or contact an administrator.")

7
Talos II / SMART repeatedly reporting temperature errors for NVMe SSD
« on: December 02, 2024, 10:07:21 am »
I just got a new Talos II TL2SV4 server a few days ago, with the 500GB Samsung internal NVMe storage option. Shortly after installing the OS, smartd started reporting "Device: /dev/nvme0, Critical Warning (0x02): Temperature" about once per day.

Looking at "smartctl -a /dev/nvme0", with the system totally idle, I see temperatures like this:

Code: [Select]
Temperature Sensor 1:               45 Celsius
Temperature Sensor 2:               52 Celsius

And after running "dd if=/dev/nvme0n1 of=/dev/null bs=1M" to generate load, after only a few seconds, I see temperatures like this:

Code: [Select]
Temperature Sensor 1:               46 Celsius
Temperature Sensor 2:               80 Celsius

These temperatures seem way too high. The BMC was also reporting a warning that "Temperature Pcie" was 49 C. I looked inside the case, and it looks like there's basically no way for any air to flow over the NVMe drive. It's all the way at the bottom against the edge of the case, with no ventilation around it.

Has anyone ran into this before? What should I do about it?

Pages: [1]