Raptor Computing Systems Hardware > Blackbird
[amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
pocock:
Regressions can be correlated with any specific feature or aspect of the platform, they don't always arise spontaneously.
The 64k page size is a significant difference for low-level code in device drivers like GPUs.
Developers of the kernel and drivers normally make a series of unit tests and manual tests before releasing new code. If they don't do any tests on systems with a 64k page size, using the same combination of CPU and GPU, then it is possible that all their tests appear to succeed and they release code including a regression.
Therefore, I highly recommend that somebody tests different permutations. I only have the RX 580 for now so I can't test this with my own kernels. I only have one ppc64el system for now, I plan to use it for other development but when I get to the point where I have multiple machines here then I could dedicate one to regression testing things like this.
MauryG5:
What I don't understand is why they are still not solving the problem considering that we have been reporting it for several months already on the official Fedora channels. TLE also reported it to AMD, what more should we do? I do not know...
pocock:
Developers are always busy. We have lists of bugs and feature requests from many places. We don't usually work through them in chronological order: they are prioritized in different ways, based on the urgency of an issue, the effort required to fix an issue, etc.
That said, developers like quick wins and low hanging fruit. If people do some testing and prove which permutations of kernel settings, firmware and hardware are troublesome and which permutations are good and also provide log data, the developer behind the code might recognize what the problem is and make a quick fix for it.
If the developer has to obtain hardware and do the tests himself, he might lose a day on it, in fact, he might never get around to it.
To give a personal example, I often spend a few weeks working on a feature or major change to some code and then before making the official release, I look over the bug list for anything that is easy and I fix those things and include them in the release. If a bug report doesn't have enough detail, I have to defer it to the next release cycle because I can't delay a release for something that I can't reproduce.
I personally have no plan to buy the RX 5700 right now, I was going to skip that generation and go directly onto Big Navi. If somebody else wants to test with one of my kernels using 4k page size, I'm happy to provide some guidance.
If anybody has contacts at AMD to get sample hardware for developers under NDA, there are a few people, myself included, who are happy to test it and provide feedback and sometimes fixes.
MauryG5:
I understand thanks for your detailed explanation. From what I see you are also a developer for Power I am very pleased. If I can ask you a question a little off topic, being you a developer, how much software do we really have today, which is developed natively on Power and therefore really exploits this architecture?
pocock:
It is a good question
I don't claim to be an expert on POWER
On the other hand, I got my first computer, TRS-80 Color Computer 3, when I was about 10 and started learning the Motorola 6809. This was really fortunate, because they used the Motorola chipset in my undergraduate studies and I had a huge advantage.
I go wherever a project takes me, from soldering together ham radio equipment to working in quantitative finance.
Most of the free, open source projects I work on are for communications. In this domain, the highest priority is interoperability, it is no use if a user on one platform can't communicate with a user on another platform. Metcalfe's law tells us that the value of any communications system increases in proportion to the number of users squared. This emphasizes how important it is for a network like SIP or XMPP to work across architectures.
Rather than designing software exclusively for POWER, my own goals typically involve designing or improving software so that it runs on any current or future platform. This is an important goal.
Some of my recent activities include starting to investigate bugs in Blenderand generalizing that to GNU/Linux development
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version