Author Topic: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes  (Read 12291 times)

pocock

  • Sr. Member
  • ****
  • Posts: 280
  • Karma: +31/-0
    • View Profile
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #15 on: September 23, 2020, 04:19:49 am »
Regressions can be correlated with any specific feature or aspect of the platform, they don't always arise spontaneously.

The 64k page size is a significant difference for low-level code in device drivers like GPUs.

Developers of the kernel and drivers normally make a series of unit tests and manual tests before releasing new code.  If they don't do any tests on systems with a 64k page size, using the same combination of CPU and GPU, then it is possible that all their tests appear to succeed and they release code including a regression.

Therefore, I highly recommend that somebody tests different permutations.  I only have the RX 580 for now so I can't test this with my own kernels.  I only have one ppc64el system for now, I plan to use it for other development but when I get to the point where I have multiple machines here then I could dedicate one to regression testing things like this.
Debian Developer
https://danielpocock.com

MauryG5

  • Hero Member
  • *****
  • Posts: 728
  • Karma: +22/-1
    • View Profile
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #16 on: September 23, 2020, 06:48:48 am »
What I don't understand is why they are still not solving the problem considering that we have been reporting it for several months already on the official Fedora channels. TLE also reported it to AMD, what more should we do? I do not know...

pocock

  • Sr. Member
  • ****
  • Posts: 280
  • Karma: +31/-0
    • View Profile
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #17 on: September 23, 2020, 12:12:18 pm »
Developers are always busy.  We have lists of bugs and feature requests from many places.  We don't usually work through them in chronological order: they are prioritized in different ways, based on the urgency of an issue, the effort required to fix an issue, etc.

That said, developers like quick wins and low hanging fruit.  If people do some testing and prove which permutations of kernel settings, firmware and hardware are troublesome and which permutations are good and also provide log data, the developer behind the code might recognize what the problem is and make a quick fix for it.

If the developer has to obtain hardware and do the tests himself, he might lose a day on it, in fact, he might never get around to it.

To give a personal example, I often spend a few weeks working on a feature or major change to some code and then before making the official release, I look over the bug list for anything that is easy and I fix those things and include them in the release.  If a bug report doesn't have enough detail, I have to defer it to the next release cycle because I can't delay a release for something that I can't reproduce.

I personally have no plan to buy the RX 5700 right now, I was going to skip that generation and go directly onto Big Navi.  If somebody else wants to test with one of my kernels using 4k page size, I'm happy to provide some guidance.

If anybody has contacts at AMD to get sample hardware for developers under NDA, there are a few people, myself included, who are happy to test it and provide feedback and sometimes fixes.
Debian Developer
https://danielpocock.com

MauryG5

  • Hero Member
  • *****
  • Posts: 728
  • Karma: +22/-1
    • View Profile
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #18 on: September 25, 2020, 12:52:40 am »
I understand thanks for your detailed explanation. From what I see you are also a developer for Power I am very pleased. If I can ask you a question a little off topic, being you a developer, how much software do we really have today, which is developed natively on Power and therefore really exploits this architecture?

pocock

  • Sr. Member
  • ****
  • Posts: 280
  • Karma: +31/-0
    • View Profile
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #19 on: September 25, 2020, 03:10:12 pm »
It is a good question

I don't claim to be an expert on POWER

On the other hand, I got my first computer, TRS-80 Color Computer 3, when I was about 10 and started learning the Motorola 6809.  This was really fortunate, because they used the Motorola chipset in my undergraduate studies and I had a huge advantage.

I go wherever a project takes me, from soldering together ham radio equipment to working in quantitative finance.

Most of the free, open source projects I work on are for communications.  In this domain, the highest priority is interoperability, it is no use if a user on one platform can't communicate with a user on another platform.  Metcalfe's law tells us that the value of any communications system increases in proportion to the number of users squared.  This emphasizes how important it is for a network like SIP or XMPP to work across architectures.

Rather than designing software exclusively for POWER, my own goals typically involve designing or improving software so that it runs on any current or future platform.  This is an important goal.

Some of my recent activities include starting to investigate bugs in Blenderand generalizing that to GNU/Linux development

Debian Developer
https://danielpocock.com

MauryG5

  • Hero Member
  • *****
  • Posts: 728
  • Karma: +22/-1
    • View Profile
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #20 on: September 25, 2020, 04:46:26 pm »
Congratulations on your activities, very interesting, you will be a good connoisseur then also of MotorolaSolutions radio equipment I presume ... In any case, returning to our Kernel speech, the problem is not only on 5700 but also on the other 5000 series, as well as on Nano mashed potato. I hope they solve it as soon as possible because we are already at version 5.8 and it still doesn't work ...

pocock

  • Sr. Member
  • ****
  • Posts: 280
  • Karma: +31/-0
    • View Profile
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #21 on: September 27, 2020, 11:59:10 am »

Motorola?   Talos II + RTL-SDR is my radio

gqrx (and GNU Radio) just works installing the packages. Please remember not to plug in the RTL-SDR dongle until after you install the packages.
Debian Developer
https://danielpocock.com

MPC7500

  • Hero Member
  • *****
  • Posts: 572
  • Karma: +40/-1
    • View Profile
    • Twitter
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #22 on: October 03, 2020, 05:16:01 pm »
Kernel 5.8.12 works again for me. Sound is for a short period of time distorted. Almost perfect.

MauryG5

  • Hero Member
  • *****
  • Posts: 728
  • Karma: +22/-1
    • View Profile
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #23 on: October 04, 2020, 07:02:04 am »
well, let's hope that Navi 10 and nano also work at this point ... This morning I did various updates and I saw the Kernel 5.8.12 but I haven't downloaded it, I also try and see what happens.  Thanks for reporting MPC ...

MauryG5

  • Hero Member
  • *****
  • Posts: 728
  • Karma: +22/-1
    • View Profile
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #24 on: October 04, 2020, 10:54:34 am »
Tried in this moment the Kernel 5.8.12, nothing to do, it doesn't work, same exact problem on Navi 10. I'm starting to think that if it continues this way this problem will never solve it is incredible...! TLE let us know if on NANO it works for you, for me on Navi 10 it doesn't work as usual....!

tle

  • Sr. Member
  • ****
  • Posts: 425
  • Karma: +47/-0
    • View Profile
    • Trung's Personal Website
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #25 on: October 24, 2020, 10:20:28 am »
I have good news, I manged to get the driver working with Linux 5.7.0 kernel by enabling `amdgpu.dc=0` parameter in GRUB2.

Unfortunately this trick no longer work with 5.8.x -> 5.10.x
Faithful Linux enthusiast

My Raptor Blackbird

pocock

  • Sr. Member
  • ****
  • Posts: 280
  • Karma: +31/-0
    • View Profile
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #26 on: October 24, 2020, 03:13:22 pm »

Does anybody have any idea what this means for Big Navi?

As they already merged patches into the kernel, should it just work out of the box?

The Big Navi launch is supposed to be on Wednesday, 28 October and we can potentially buy the cards in November.

Debian Developer
https://danielpocock.com

tle

  • Sr. Member
  • ****
  • Posts: 425
  • Karma: +47/-0
    • View Profile
    • Trung's Personal Website
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #27 on: October 25, 2020, 12:07:34 am »
I think we could find out when one of us get their hands on the new card next month....
Faithful Linux enthusiast

My Raptor Blackbird

MauryG5

  • Hero Member
  • *****
  • Posts: 728
  • Karma: +22/-1
    • View Profile
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #28 on: November 12, 2020, 12:48:49 pm »
Guys I found an answer from Fedora's team about Kernel 5.7/5.8 with AMD Navi 10 GPUs, they tell me that the problems related to AMD GPUs for our architecture, are not treated by AMD and so it must be one of us to find them, like Daniel did on Kernel 5.6. So if not before ours solve it, there is no way to fix the problem unfortunately. Great problem I would say at this point... We don't know when we will be able to get the bug fixed at this point...
 :( :( :(

pocock

  • Sr. Member
  • ****
  • Posts: 280
  • Karma: +31/-0
    • View Profile
Re: [amdgpu] [Fiji] Fedora 32 Linux kernel 5.7.x crashes
« Reply #29 on: November 12, 2020, 12:53:30 pm »
Did anybody try the kernel with 4k page size?  I feel it would be very useful to have feedback on that.

The Big Navi cards arrive on 18 November, next Wednesday and I'm probably going to try one of those in my system.
Debian Developer
https://danielpocock.com