Author Topic: Blackbird Cooling  (Read 6962 times)

cy384

  • Newbie
  • *
  • Posts: 13
  • Karma: +4/-0
    • View Profile
    • http://cy384.com/
Blackbird Cooling
« on: October 08, 2022, 01:35:46 pm »
I've been struggling a bit to figure out how to best cool my Blackbird.  Possibly this is made worse by my choice of case (smallest mATX I could find, a SAMA IM01) and the CPU (160W 16 core).  I am using the 3U heatsink module with the provided fan.  Main complaints:

* CPU fan is oriented up/down, and blows downward, like an inch away from whatever's in the first PCIe slot, so I flipped the fan around to blow upwards, which means hot air going over the voltage regulators.
* No heatsink on the voltage regulators, while the Talos and Talos Lite both have them.  The BB doesn't have holes to mount a heatsink there, either.
* I'm not sure where various heat sensors are, physically, on the board.  Where's the ambient temperature sensor?  The PCIe sensor?  The CPU ambient sensor?
* RAM slots are as close as physically possible to the CPU (good for signal integrity, bad for airflow).
* Voltage regulator temps do not seem to be considered in setting fan speed.
* Changing the cooling parameters requires recompiling firmware.

I don't really care about any of these except that the voltage regulators hit 90C within a minute under heavy load.  I've ordered some tiny little heatsinks that can be stuck directly to the chips but I'm wondering if anyone else has a nice solution here.  They're really low on the board and in an awkward spot.

I do have a 3D printer and will design some ducts/shrouds if I can't get temps low enough otherwise.

Attaching some pics for the curious.

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 473
  • Karma: +37/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: Blackbird Cooling
« Reply #1 on: October 08, 2022, 05:15:37 pm »
I do think the case is a big part of the problem. I certainly wouldn't run even an 8-core in an mATX case. Even the 4-core in my mATX system probably runs the fans more often than I'd like.

Vikings is working on a liquid cooling setup and this might be an option for you when they get it up for sale.

There are no ambient sensors on the board that I know of, and the manual doesn't mention any. The manual adds, for what it's worth, "The C1P9S01 factory setpoint for CPU core temperature is 60°C, and the system will attempt to maintain the cores at that temperature even under light or no load. As a result, a lightly loaded system may not benefit from air drawn over the CPU heatsink(s), and mainboard / peripheral cooling predominantly comes from chassis fans in this situation. For this reason, it is important to connect at least one chassis fan providing airflow over the mainboard surface in order to provide cooling for memory modules and other active components."

cy384

  • Newbie
  • *
  • Posts: 13
  • Karma: +4/-0
    • View Profile
    • http://cy384.com/
Re: Blackbird Cooling
« Reply #2 on: October 08, 2022, 06:09:00 pm »
just to share some numbers, here's what "sensors" reports at idle:

Code: [Select]
nvme-pci-0100
Adapter: PCI adapter
Composite:    +39.9°C  (low  = -273.1°C, high = +76.8°C)
                       (crit = +79.8°C)
Sensor 1:     +39.9°C  (low  = -273.1°C, high = +65261.8°C)

ibmpowernv-isa-0000
Adapter: ISA adapter
Chip 0 Vdd Remote Sense: 683.00 mV (lowest =  +0.67 V, highest =  +1.01 V)
Chip 0 Vdn Remote Sense: 674.00 mV (lowest =  +0.67 V, highest =  +0.67 V)
Chip 0 Vdd:              685.00 mV (lowest =  +0.68 V, highest =  +1.02 V)
Chip 0 Vdn:              675.00 mV (lowest =  +0.68 V, highest =  +0.68 V)
Chip 0 Core 0:            +44.0°C  (lowest = +22.0°C, highest = +65.0°C)
Chip 0 Core 4:            +44.0°C  (lowest = +22.0°C, highest = +65.0°C)
Chip 0 Core 8:            +44.0°C  (lowest = +23.0°C, highest = +65.0°C)
Chip 0 Core 12:           +44.0°C  (lowest = +23.0°C, highest = +65.0°C)
Chip 0 Core 16:           +45.0°C  (lowest = +21.0°C, highest = +65.0°C)
Chip 0 Core 20:           +45.0°C  (lowest = +23.0°C, highest = +67.0°C)
Chip 0 Core 24:           +45.0°C  (lowest = +23.0°C, highest = +66.0°C)
Chip 0 Core 28:           +44.0°C  (lowest = +24.0°C, highest = +68.0°C)
Chip 0 Core 32:           +44.0°C  (lowest = +22.0°C, highest = +67.0°C)
Chip 0 Core 36:           +44.0°C  (lowest = +22.0°C, highest = +67.0°C)
Chip 0 Core 40:           +44.0°C  (lowest = +22.0°C, highest = +66.0°C)
Chip 0 Core 44:           +44.0°C  (lowest = +22.0°C, highest = +67.0°C)
Chip 0 Core 48:           +45.0°C  (lowest = +23.0°C, highest = +66.0°C)
Chip 0 Core 52:           +45.0°C  (lowest = +23.0°C, highest = +67.0°C)
Chip 0 Core 56:           +45.0°C  (lowest = +23.0°C, highest = +68.0°C)
Chip 0 Core 60:           +45.0°C  (lowest = +24.0°C, highest = +68.0°C)
Chip 0 DIMM 0 :           +49.0°C  (lowest = +30.0°C, highest = +51.0°C)
Chip 0 DIMM 1 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 2 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 3 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 4 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 5 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 6 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 7 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 8 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 9 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 10 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 11 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 12 :          +50.0°C  (lowest = +30.0°C, highest = +53.0°C)
Chip 0 DIMM 13 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 14 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 15 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 Nest:              +45.0°C  (lowest = +23.0°C, highest = +63.0°C)
Chip 0 VRM VDD:           +53.0°C  (lowest = +35.0°C, highest = +90.0°C)
Chip 0 :                  32.00 W  (lowest =  28.00 W, highest = 156.00 W)
Chip 0 Vdd:                4.00 W  (lowest =   2.00 W, highest = 127.00 W)
Chip 0 Vdn:                9.00 W  (lowest =   7.00 W, highest =  11.00 W)
Chip 0 :                 326.27 kJ
Chip 0 Vdd:               47.31 kJ
Chip 0 Vdn:               89.58 kJ
Chip 0 Vdd:                6.38 A  (lowest =  +4.00 A, highest = +129.75 A)
Chip 0 Vdn:               14.38 A  (lowest = +11.50 A, highest = +17.38 A)

and I've attached an image of what the openbmc web interface reports.

Now that I look at it again, the "Temperature Pcie" might just be the NVMe drive.

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 473
  • Karma: +37/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: Blackbird Cooling
« Reply #3 on: October 08, 2022, 06:38:56 pm »
Yes, that's what it is. Here's this T2, for comparison (dual-8, two NVMe drives, BTO WX7100 GPU):

Code: [Select]
nvme-pci-330100
Adapter: PCI adapter
Composite:    +48.9°C  (low  = -273.1°C, high = +82.8°C)
                       (crit = +84.8°C)
Sensor 1:     +48.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +57.9°C  (low  = -273.1°C, high = +65261.8°C)

ibmpowernv-isa-0000
Adapter: ISA adapter
Chip 0 Vdd Remote Sense: 800.00 mV (lowest =  +0.66 V, highest =  +1.00 V)
Chip 0 Vdn Remote Sense: 701.00 mV (lowest =  +0.70 V, highest =  +0.70 V)
Chip 8 Vdd Remote Sense: 648.00 mV (lowest =  +0.64 V, highest =  +0.93 V)
Chip 8 Vdn Remote Sense: 661.00 mV (lowest =  +0.66 V, highest =  +0.66 V)
Chip 0 Vdd:              804.00 mV (lowest =  +0.67 V, highest =  +1.00 V)
Chip 0 Vdn:              702.00 mV (lowest =  +0.70 V, highest =  +0.70 V)
Chip 8 Vdd:              650.00 mV (lowest =  +0.65 V, highest =  +0.93 V)
Chip 8 Vdn:              662.00 mV (lowest =  +0.66 V, highest =  +0.66 V)
Chip 0 Core 0:            +53.0°C  (lowest =  +6.0°C, highest = +87.0°C)
Chip 0 Core 4:            +53.0°C  (lowest = +10.0°C, highest = +89.0°C)
Chip 0 Core 8:            +53.0°C  (lowest =  +6.0°C, highest = +88.0°C)
Chip 0 Core 12:           +53.0°C  (lowest = +33.0°C, highest = +89.0°C)
Chip 0 Core 16:           +53.0°C  (lowest = +31.0°C, highest = +87.0°C)
Chip 0 Core 20:           +53.0°C  (lowest = +31.0°C, highest = +87.0°C)
Chip 0 Core 24:           +54.0°C  (lowest = +32.0°C, highest = +87.0°C)
Chip 0 Core 28:           +54.0°C  (lowest = +32.0°C, highest = +87.0°C)
Chip 8 Core 32:           +44.0°C  (lowest = +30.0°C, highest = +74.0°C)
Chip 8 Core 36:           +44.0°C  (lowest = +29.0°C, highest = +75.0°C)
Chip 8 Core 40:           +43.0°C  (lowest = +31.0°C, highest = +80.0°C)
Chip 8 Core 44:           +43.0°C  (lowest = +30.0°C, highest = +74.0°C)
Chip 8 Core 48:           +43.0°C  (lowest = +29.0°C, highest = +75.0°C)
Chip 8 Core 52:           +44.0°C  (lowest = +29.0°C, highest = +75.0°C)
Chip 8 Core 56:           +44.0°C  (lowest = +30.0°C, highest = +75.0°C)
Chip 8 Core 60:           +44.0°C  (lowest = +29.0°C, highest = +72.0°C)
Chip 0 DIMM 0 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 1 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 2 :           +51.0°C  (lowest = +35.0°C, highest = +58.0°C)
Chip 0 DIMM 3 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 4 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 5 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 6 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 7 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 8 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 9 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 10 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 11 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 12 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 13 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 DIMM 14 :          +49.0°C  (lowest = +38.0°C, highest = +58.0°C)
Chip 0 DIMM 15 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 0 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 1 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 2 :           +41.0°C  (lowest = +35.0°C, highest = +44.0°C)
Chip 8 DIMM 3 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 4 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 5 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 6 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 7 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 8 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 9 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 10 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 11 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 12 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 13 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 8 DIMM 14 :          +39.0°C  (lowest = +34.0°C, highest = +42.0°C)
Chip 8 DIMM 15 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
Chip 0 Nest:              +53.0°C  (lowest = +33.0°C, highest = +82.0°C)
Chip 8 Nest:              +44.0°C  (lowest = +31.0°C, highest = +68.0°C)
Chip 0 VRM VDD:           +53.0°C  (lowest = +40.0°C, highest = +71.0°C)
Chip 8 VRM VDD:           +39.0°C  (lowest = +35.0°C, highest = +56.0°C)
Chip 0 :                  44.00 W  (lowest =  28.00 W, highest = 140.00 W)
Chip 0 Vdd:               16.00 W  (lowest =   0.00 W, highest = 110.00 W)
Chip 0 Vdn:                9.00 W  (lowest =   7.00 W, highest =  12.00 W)
Chip 8 :                  31.00 W  (lowest =  28.00 W, highest = 127.00 W)
Chip 8 Vdd:                4.00 W  (lowest = 1000.00 mW, highest =  98.00 W)
Chip 8 Vdn:                8.00 W  (lowest =   7.00 W, highest =  11.00 W)
Chip 0 :                  28.18 MJ
Chip 0 Vdd:                4.20 MJ
Chip 0 Vdn:                7.71 MJ
Chip 8 :                  27.36 MJ
Chip 8 Vdd:                4.23 MJ
Chip 8 Vdn:                6.85 MJ
Chip 0 Vdd:               12.38 A  (lowest =  +0.38 A, highest = +113.38 A)
Chip 0 Vdn:               12.88 A  (lowest = +11.13 A, highest = +17.88 A)
Chip 8 Vdd:               12.63 A  (lowest =  +1.63 A, highest = +109.50 A)
Chip 8 Vdn:               13.13 A  (lowest = +11.63 A, highest = +17.00 A)

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:      750.00 mV
fan1:        1272 RPM  (min =  700 RPM, max = 4500 RPM)
edge:         +55.0°C  (crit = +99.0°C, hyst = -273.1°C)
PPT:           9.24 W  (cap =  95.00 W)

nvme-pci-310100
Adapter: PCI adapter
Composite:    +39.9°C  (low  = -273.1°C, high = +82.8°C)
                       (crit = +84.8°C)
Sensor 1:     +39.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +47.9°C  (low  = -273.1°C, high = +65261.8°C)

cy384

  • Newbie
  • *
  • Posts: 13
  • Karma: +4/-0
    • View Profile
    • http://cy384.com/
Re: Blackbird Cooling
« Reply #4 on: October 13, 2022, 07:01:38 pm »
Got these nice little copper heatsinks installed and did some quick temperature testing... not a huge difference, unfortunately!  Need to redirect more airflow over them.

Edit: also, the voltage regulators seem to be TDA21472, with a maximum recommended temperature of 125C and thermal shutdown at 140C.  I guess running them hot is only moderately concerning.
« Last Edit: October 13, 2022, 07:23:10 pm by cy384 »

MPC7500

  • Hero Member
  • *****
  • Posts: 596
  • Karma: +41/-1
    • View Profile
    • Twitter
Re: Blackbird Cooling
« Reply #5 on: October 13, 2022, 08:18:18 pm »
I need to tweak my airflow, too. The default settings aren't good. Because my fan only begin to spin at 30% of the PWM signal.
https://wiki.raptorcs.com/wiki/Fan_Tuning

ejfluhr

  • Newbie
  • *
  • Posts: 44
  • Karma: +3/-0
    • View Profile
Re: Blackbird Cooling
« Reply #6 on: October 18, 2022, 01:02:27 am »
https://www.infineon.com/cms/en/product/power/dc-dc-converters/integrated-power-stages/tda21472/

It looks like that VRM is rated at 60A - 70A in typical ambient temps (e.g. < 40C).   It seems like there must be at least 2 in parallel on VDD, for a capacity of 120+A?   Efficiency falls off as the load gets that high, though.  It would be better to have 3 such stages so they draw only 30A - 40A under normal operation.

cy384

  • Newbie
  • *
  • Posts: 13
  • Karma: +4/-0
    • View Profile
    • http://cy384.com/
Re: Blackbird Cooling
« Reply #7 on: October 20, 2022, 03:24:26 pm »
It looks like there are 7 of the voltage regulator chips, no idea how they're used/wired up though.

Reading about the fan control code and it seems kinda crude, very on/off in design, based more on events rather than target temperatures and dynamic control.  Under max load my CPU fan oscillates between high and medium speeds very noticeably.  I assume this is not an upstream concern, since nobody cares if a server is slightly louder, but it's unpleasant for a workstation.  Raptor spliced in some PID control but I don't think the tuning is ideal.

AFAICT the regulator temperature is considered for fan speed but only after it hits 85C.  Ref https://git.raptorcs.com/git/blackbird-openbmc/tree/meta-rcs/meta-blackbird/recipes-phosphor/fans/phosphor-fan-control-events-config-native/events.yaml

Borley

  • Full Member
  • ***
  • Posts: 181
  • Karma: +17/-0
    • View Profile
Re: Blackbird Cooling
« Reply #8 on: November 01, 2022, 08:27:06 pm »
Please checkout the cooling idea at DIY 2U heatsink+fan setup. I have my Blackbird stuffed into a slimline mATX case and the low-profile CPU cooler. The temps never really exceed 50C even when gaming.

mx08

  • Newbie
  • *
  • Posts: 8
  • Karma: +2/-0
    • View Profile
Re: Blackbird Cooling
« Reply #9 on: November 16, 2022, 08:25:25 am »
There's an alternative firmware for the BMC on the Blackbird (maybe also Talos 2) called !BMC or bangBMC.

Link: https://gitlab.raptorengineering.com/bangbmc-firmware
Description: https://gitlab.raptorengineering.com/bangbmc-firmware/br-bangbmc/-/blob/raw-first-pass/docs/manual/about.md
Here's the fan daemon: https://gitlab.raptorengineering.com/bangbmc-firmware/op-fan-daemon

It's not mentioned in the wiki AFAICS... It allows to set fan curves at runtime, without requiring recompilation everytime.

I've not used it myself yet and don't know much more than what I've written above.

r34per

  • Newbie
  • *
  • Posts: 27
  • Karma: +2/-0
    • View Profile
Re: Blackbird Cooling
« Reply #10 on: August 04, 2023, 01:51:54 pm »
sorry to necro this thread, but where did you get those heatsinks from @cy384? I plan on trying a 16 core cpu in my blackbird and those heatsinks look like they'd work pretty well with my fan setup

cy384

  • Newbie
  • *
  • Posts: 13
  • Karma: +4/-0
    • View Profile
    • http://cy384.com/
Re: Blackbird Cooling
« Reply #11 on: August 05, 2023, 08:01:03 pm »
sorry to necro this thread, but where did you get those heatsinks from @cy384? I plan on trying a 16 core cpu in my blackbird and those heatsinks look like they'd work pretty well with my fan setup

I used this double sided thermal tape and these 6.5mm x 6.5mm heatsinks, but they're both just chinese generic parts, probably easy to find many different "brands" on ebay or amazon or whatever.  I used a flathead screwdriver to spread the tines of the heatsinks a bit.

I changed around my fan setup so they get more air flow, but the power area still gets really hot and I don't think the power/thermal management stuff is being run optimally.  Fortunately/unfortunately it works well enough that I don't want to spend more time messing with it!

r34per

  • Newbie
  • *
  • Posts: 27
  • Karma: +2/-0
    • View Profile
Re: Blackbird Cooling
« Reply #12 on: August 09, 2023, 06:32:50 am »
Gotcha, thanks! Also your post about recompiling the firmware with additional WOF tables was a huge help for me! I bought the same 16 core cpu as you and after following what you did the errors cleared out in openbmc. No way I would have figured that out myself ;D
« Last Edit: August 09, 2023, 06:35:23 am by r34per »