Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - shawnanastasio

Pages: [1]
1
Talos II / Re: WX5100 Causing System Hangs
« on: December 18, 2020, 02:13:41 pm »
In case you haven't seen it yet, this issue (https://gitlab.freedesktop.org/drm/amd/-/issues/1171) mentions the kernel parameter amdgpu.runpm=0 as a workaround.

2
Operating Systems and Porting / Re: [NEWS] Linux kernel 5.8 is out!
« on: August 03, 2020, 08:04:24 pm »
TLDR: POWER10 is not off the table by any means; we have every intention of creating a POWER10 product line, but there are complex negotiations in play to reach the point where those POWER10 products will be up to our high open firmware / open systems standards.

Can you elaborate on this? Has IBM introduced proprietary/vendor-signed components to POWER10 firmware?

3
Are there any plans to get this merged upstream? I don't see it mentioned on the LWJGL issue you posted (https://github.com/LWJGL/lwjgl3/issues/495).

4

Do you have the option to exchange the RX 580, did you try a replacement?

Many retailers will exchange or refund any product within the first 2 - 4 weeks, especially in Europe.  Is this enough time for somebody to detect a problem like that?

While the card was definitely intermittent from the start, it was still usable, and the issues I encountered towards the beginning could just as well have been unrelated driver bugs.
It started getting really bad after about a year of use, which is well outside the return window. I didn't bother requesting an RMA, since I needed a working card as soon as possible and just decided to buy a WX.

I understand that in various ways the WX series (now the W series) cards are more tested by AMD but has either AMD or Raptor given any guarantee about them on POWER9 in general or on the Raptor hardware specifically?

If that is written somewhere then it provides an extra reason to prefer those cards for those who can buy them..

Of course neither AMD nor Raptor provide official guarantees about hardware compatibility - AMD doesn't care about POWER9 and Raptor couldn't audit the firmware/drivers to ensure complete compatibility even if they wanted to. That said, the fact that Raptor's prebuilt machines all ship with WX cards is a pretty strong vote of confidence I'd say.

5
I originally purchased an RX 580 for my Talos but it never behaved properly. Frequent EEH errors, kernel panics, and black screens. After switching to a WX5100, I haven't encountered any issues.

I've heard people theorize that the consumer cards have less mature firmware that ends up tripping the PCIe DMA protections on POWER9 that don't exist on x86. At least anecdotally this seems accurate, since that same RX 580 works just fine in an x86 machine.

Of course, this probably doesn't hold true for all consumer cards, though for me the investment in a guaranteed-working WX-series card was well worth it.

6
Check to see if you have the kvm_hv kernel module loaded with
Code: [Select]
lsmod |grep kvm
 If it's not there, try
Code: [Select]
sudo modprobe kvm_hv

7
Blackbird / Re: Heatsink dual fan configuration
« on: January 25, 2020, 11:13:47 am »
@mx08 or @ shawnanastasio:
In the next few days would like to update to bangBMC. Is the procedure identical to Updating the BMC firmware?
Or how do I proceed?

And is there a comparison table (features/ pros and cons) somewhere between OpenBMC and bangBMC?

bangBMC is currently in pretty early stages and I wouldn't really recommend it unless you have flashing hardware and BMC serial access to recover in case something goes wrong.
As far as a comparison table, I don't think such a thing exists, but I should talk to bangBMC's creator (dormito on IRC) about creating one.

The TL;DR is that it's a super simple distribution that just contains the bare minimum for booting the P9 cores and nothing else. (no systemd, no dbus, no web servers, no python, no C++ runtime)
This makes it well suited for desktop machines where you don't necessarily need/want all the features oBMC provides and would prefer a minimal (and much faster!) replacement.

The fan daemon I wrote is pretty bare-bones and should be much easier to configure than the oBMC one.
The current algorithm works by just reading the current maximum temperature per-zone and looking up a corresponding fan speed in a table, so changing it should be pretty straightforward:

https://git.anastas.io/shawnanastasio/op-fan-daemon/src/branch/master/include/curve.h

Eventually I'll add config file support and a fancier PID-based control algorithm. Once this is done, Raptor has expressed interest in potentially shipping it along with bangBMC as an option for some of their machines.  ;D

Pages: [1]