Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - pocock

Pages: 1 ... 15 16 [17] 18 19
241

I created a separate thread with some observations about NVIDIA, maybe it would be better to discuss specifics of their drivers there.

242

These are some of the links that appear most relevant:

NVIDIA forum, describes support for P100 on POWER8 and V100 on POWER9.

NVIDIA media release about collaboration with IBM on open source machine learning with Tesla V100

IBM pages about the same topic

The V100 products are rather expensive.

Nonetheless, some people have suggested that bits of the proprietary driver can be used to run other NVIDIA GPUs in a POWER system.  This thread might be a useful place to collect comments on topics like that.


243
This generation of cards doesn't have full hardware ray tracing support, I don't know if that is also a factor in that particular issue.

Many of the reviews of the RX 5700 and W5700 mention that the current NVIDIA cards may be better for people who need ray tracing.  But the NVIDIA drivers are less open than the AMD driver and may not work on POWER.

The previews of the next generation AMD Radeon "big navi" cards suggest they plan to add full ray tracing support and close the gap with NVIDIA but we may not see any cards like that until the end of the year.  That may eliminate the type of issue described in CT magazine.

For me, that means I'm tempted to use cheaper cards, such as the RX 5500 with 4GB or even something older from ebay as a temporary solution for six months.  I don't specifically need ray tracing but would like to have the overall combination of features, for example, AV1 support and lower power consumption.

244

AMD doesn't care about POWER9

they have a wealth of knowledge about POWER9 in AMD.

Whether they will apply it or support it is another thing.  NVIDIA provides POWER9 binary only drivers for a couple of their cards.

245
I originally purchased an RX 580 for my Talos but it never behaved properly. Frequent EEH errors, kernel panics, and black screens. After switching to a WX5100, I haven't encountered any issues.

I've heard people theorize that the consumer cards have less mature firmware that ends up tripping the PCIe DMA protections on POWER9 that don't exist on x86. At least anecdotally this seems accurate, since that same RX 580 works just fine in an x86 machine.

Do you have the option to exchange the RX 580, did you try a replacement?

Many retailers will exchange or refund any product within the first 2 - 4 weeks, especially in Europe.  Is this enough time for somebody to detect a problem like that?

Of course, this probably doesn't hold true for all consumer cards, though for me the investment in a guaranteed-working WX-series card was well worth it.

I understand that in various ways the WX series (now the W series) cards are more tested by AMD but has either AMD or Raptor given any guarantee about them on POWER9 in general or on the Raptor hardware specifically?

If that is written somewhere then it provides an extra reason to prefer those cards for those who can buy them.

Thanks for taking the time to share your observations.

246
Are there any packages that don't exist at all in Debian but would be essential or highly desirable on Debian systems?

Are there packages in unstable that are either not in the stable release or they require something from a newer version to be backported to stable to make them work on ppc64el?

Maybe these could be entered as RFP or Request-for-backport bugs in bugs.debian.org and then a usertag can be added to all of those so they can be found quickly.  This is an example of how arm64 bugs and package requests are tagged.

I'm willing to build packages or backports of the most important things that I will use myself but it would be helpful if people can point them out or enter any hints or concerns through the BTS.

247

This is a little bit confusing in Debian and Docker

A search for a package with the exact name docker finds this:

https://packages.qa.debian.org/docker

Searching for any package including the name docker finds a lot more, including the docker.io package which is more recent:

https://packages.debian.org/search?keywords=docker&searchon=names&suite=stable&section=all

https://tracker.debian.org/pkg/docker.io

They wrote that it is not supported on anything except amd64 but they don't explicitly write that it won't work

Here it says the compile succeeded for ppc64el but not on other ppc variants:

https://buildd.debian.org/status/package.php?p=docker.io

Does anybody know what "not supported" means in this context?

248

Thanks for that feedback

This is the Arty A7 board:
https://reference.digilentinc.com/reference/programmable-logic/arty-a7/start

and it has a radio module that can do things like Zigbee:
https://store.digilentinc.com/pmod-rf2-ieee-802-15-rf-transceiver/

That could make it an interesting competitor for devices like the Zigate USB dongles.  Note that Zigate is currently programmed using a proprietary toolchain and the Arty A7 radio module also requires what appears to be a proprietary SDK right now.

It would be interesting to see all of that opened up like the POWER ISA.

249

Search engines return a lot of publicity about Microwatt but I couldn't see any practical examples of hardware and operating systems

Which FPGAs are people aiming for or testing right now?

Is it at the stage where somebody could build something vaguely similar to a Raspberry Pi, even if it lacks some of the ports?

Is it stable enough to run any GNU/Linux or BSD OS right now?


250

Another thing that comes to mind: AMD's Big Navi cards are coming later in 2020.  It may be wise not to buy any large GPUs, Pro or prosumer, if better cards will arrive in less than 6 months.  Key benefits of the Big Navi may be support for AV1 video decoding, which will be standard for Youtube and Netflix in the future, ray tracing and another step change in power consumption, heat dissipation and noise figures.

2) if you're after lower power consumption (pro cards use binned silicon, lower voltages, lower clocks

The RX 5700 vs the RX 5700 XT:
RX 5700 has the lower clocks and lower overall power use, much like the Pro W5700

3) you really need more than 3 displayports 1.4

Yes, this is another point I had noticed.  The W5700 has 5 mini-DisplayPorts and 1 USB C so you can build a six-screen configuration for a trader desktop using just one GPU.  Previously people would use 2 GPUs, 4 slots and a bigger PSU to create those systems.

so yeah, not worth the price premium most of the time.

Given that the specs vary with each new generation of these cards, that threshold is not always clear.  There are projects were I felt completely comfortable specifying the relevant Pro card (whether it was AMD or NVIDIA) but in the case of Raptor users, the criteria change even further.

there are actual two-slot versions of RX5700/5700XT which won't take away slot space on blackbird:

Thanks for highlighting these, that saves a lot of manual searching.

https://www.powercolor.com/product?id=1565953800
https://www.asrock.com/Graphics-Card/AMD/Radeon%20RX%205700%20XT%20Challenger%20D%208G%20OC/

I have the ASRock and it works well (and it's basically inaudible regardless of load). I have a 10G NIC in the second PCIe and it fits fine.

Of course, the reference versions are also 2-slot, but they're also noisy and run hot.

The 10G NIC is full height or half height?

A full height card would fully cover at least one of those intake fans.  Search results don't reveal anything helpful about the wisdom of doing that but for any Blackbird user, they have nowhere else to put the card, unless they have a case that is large enough to use a PCIe 4.0 compatible riser cable to mount the 8x card elsewhere.

251

In terms of the RX 5700 / Pro W5700, one of the most interesting finds is the Tom's Hardware review of the Sapphire Pulse RX 5700 XT.  There is also a long reddit discussion about which card is actually quietest.

The key points:

- the Sapphire has two BIOS chips and a DIP switch to choose one or the other.  One BIOS gives you gamer performance (higher clock rate, uses more power), the other BIOS gives you a conservative performance profile that looks almost identical to the Pro W5700

- the OEM cooling solution is more effective, so the fans run more slowly, more quietly and may last longer

- it is a little bigger than two slots.  On Talos II you definitely lose one slot but on Blackbird, where the slots are separated more, you might still lose the second slot and as there are only two slots on Blackbird, that would be a headache for many people.

In terms of supporting Raptor, I don't think the extra price of the WX 7100 goes into their pockets.  By saving $500 on this card, you are half way to buying another Blackbird board, you could give that money to somebody who does porting work, you could spend it on trips to events where you demo the product and these things would all do more to support Raptor and the OpenPOWER ecosystem.

For me, it is not about price sensitivity, it is about

a) identifying what features I actually need

b) do I need to buy a Pro version to get any of those features?  Or in the case of noise, it appears the OEM version is actually quieter, paying less gives me that feature.

252

The RX 5700 came out last year and it was followed 6 months later by the Pro version, specifically the Radeon Pro W5700.

The W5700 is basically double the price of the RX 5700.  From the perspective of a POWER user, is this worthwhile?

Summarizing some of the key differences in the Pro version:

- AMD is testing the hardware and drivers more thoroughly: but do they test on any POWER9 systems?

- AMD is releasing driver updates for the Pro cards on a regular schedule: do these bug fixes appear in the amdgpu release for Linux users just as quickly?

- the marketing material describes various features, such as the AMD Remote Workstation (use your GPU remotely from a laptop) but is that relevant for a Linux user?  The software they offer is proprietary, so there are a large percentage of people in this space who would not use it anyway and we also have free software alternatives

- the last significant benefit I could see: the overall design is less aggressive, slightly less power consumption and lower clockrates than other cards so even ignoring the questions about drivers, maybe it will last longer and be more stable

- some people justified the purchase of Radeon Pro products when they included ECC RAM but in the W5700, it is not ECC, it is the same as the RX 5700

I've got an open mind about this: for example, an OEM built RX 5700 that has liquid cooling and isn't overclocked may be more relevant to some people than the W5700.  But if AMD is regularly testing amdgpu with W5700 on POWER9 then that alone would make me feel they are investing in this architecture.

253
Applications and Porting / Re: VP9 benchmarks: have they improved?
« on: June 05, 2020, 12:18:35 pm »

This blog post gives some detailed discussion about VPX / VP9 / VP8 as it relates to POWER9 and other platforms.

254
Applications and Porting / VP9 benchmarks: have they improved?
« on: June 05, 2020, 12:12:38 pm »

One of the top search results for VP9 benchmarks is this OpenBenchmarking.org site where POWER9 is the slowest with 9.37 frames per second (less than real time).  Even the Intel i3 achieves 28.38 fps (better than real time).

Has this code been fixed and does anybody know how to get fresh results in that site?  I think it is very unfair to the platform when search engines show something like that if it is no longer valid.

On the other hand, if it is not fixed, has anybody proposed a bounty for working on it?  IBM is asking people to suggest issues that they will fund

255

Linux provides a mechanism to disable individual cores.  This can be useful to reduce peak power consumption or to simulate a smaller environment, for example, if a developer with a Talos II wants to know how their application would perform on a Blackbird with a 4-core CPU, they can turn off all but 4 cores.

echo 0 | sudo tee /sys/devices/system/cpu/cpu1/online

Spoiler: If you put all the cores of one CPU offline with that command then you won't be able to access the RAM and PCI slots connected to that CPU and you might observe strange behaviour.

Is it possible to go one step further and completely power down a CPU socket and maybe the associated RAM banks too, almost as if they were removed from the board?

There is some documentation about Linux kernel hotplug and it suggests x86 only.  Maybe this would be good for another bounty but first it is important to understand whether the Raptor and POWER9 hardware supports this and whether it would lead to energy savings or other benefits.

Problems that would be solved with this:

- reducing heat output from Talos II workstations during summer heatwaves

- extending runtime for a system on UPS batteries

Pages: 1 ... 15 16 [17] 18 19