Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - madscientist159

Pages: 1 2 [3] 4
31
General Discussion / Re: 2 boot work-around
« on: December 29, 2019, 02:53:57 pm »
If this is in relation to the AMD GPU hang, the actual fix for the GPU ASIC not reloading firmware is in kernel 5.1.  Once we update the skiroot kernel to 5.4 (with the Navi patches) hopefully this problem will go away for good. 8)

32
Applications and Porting / Re: Blu-ray applications
« on: December 15, 2019, 07:16:07 am »
Without going into the bluray issue for legal reasons, I do want to correct the VLC concern.  I use VLC on POWER quite often as part of my workflow -- e.g. to check video output from kdenlive (in 4k, actually!).  If your distro doesn't have VLC for POWER I'd be very, very surprised. ;)

33
Legacy POWER Hardware / Re: Old stuff survey
« on: December 11, 2019, 12:02:25 am »
Not to be a (terrible) pedant, but while they do indeed implement (at least most of) the later ISA, the QorIQ e5500 microarchitecture doesn't have much in common with POWER7's. It's essentially a 64-bit e500 without the SPE crap and an FPU borrowed from the e600, itself an evolution of the 744x G4. The lack of AltiVec/VMX is a terrible omission, however, and limits its utility for general purpose computing. It's really an embedded part and meant for that application, and I have said in other places how much of a disservice I think running such CPUs in pricey desktops is to the Amiga community.

No problem -- thanks for the pedantry here, all I really know about the e5000 parts is that we disqualified them really early on for performance and concerns over the integrated "mystery meat" security core. 8)

34
Legacy POWER Hardware / Re: Old stuff survey
« on: December 11, 2019, 12:00:29 am »
Not to be a (terrible) pedant, but while they do indeed implement (at least most of) the later ISA, the QorIQ e5500 microarchitecture doesn't have much in common with POWER7's. It's essentially a 64-bit e500 without the SPE crap and an FPU borrowed from the e600, itself an evolution of the 744x G4. The lack of AltiVec/VMX is a terrible omission, however, and limits its utility for general purpose computing. It's really an embedded part and meant for that application, and I have said in other places how much of a disservice I think running such CPUs in pricey desktops is to the Amiga community.
Quote

No problem -- thanks for the pedantry here, all I really know about the e5000 parts is that we disqualified them really early on for performance and concerns over the integrated "mystery meat" security core. 8)

35
User Zone / Re: Void Linux thread
« on: December 10, 2019, 06:22:45 pm »
Currently attempting a 64-bit long double transition to get rid of the messy IBM double-double format used on most distros and unify the representation across targets. Musl has been using it all along, now trying glibc.

I don't want to go with the 128-bit IEEE754 format as that requires VSX and would mean a different one on ppc64le and the rest (and also, when not targeting POWER9, it means basically reverting to soft-float).

What kind of performance loss are we looking at though on POWER9 vs. 128-bit IEEE754?

36
Legacy POWER Hardware / Re: Old stuff survey
« on: December 09, 2019, 05:34:04 pm »
The most recent X 5000 has the Freescale P5020 or P 5040 but without Altivec and they are based on Power 8 if I remember correctly ...

POWER7, actually.  They implement Power ISA v2.06 (https://wiki.raptorcs.com/wiki/Power_ISA) and are BE only.

Confusingly the actual product page (https://www.nxp.com/products/processors-and-microcontrollers/power-architecture/qoriq-communication-processors/p-series/qoriq-p5020-and-p5010-64-bit-dual-and-single-core-communications-processors:P5020) refers to a "P5 platform" which I suspect is not POWER5, but I have no idea offhand what it is.  I do know the 4 core and up chips have a mystery management core of some sort, but then again with pre-OpenPOWER stuff the chips weren't exactly open source firmware across the board regardless of vendor.

Also, welcome to all here on this subforum!  I don't have any pre-OpenPOWER ppc[64] hardware but find the history fascinating regardless.

37
User Zone / Re: Graphics Card install
« on: December 08, 2019, 06:25:39 pm »
To be honest I'd be very, very surprised if there was a ppc64le specific bug in the AMD display detection.  That code is all straight C, and the bit count and endianness both match x86 exactly.

Thankfully I don't have to do this very often, but...

<eats crow.  mfff, featherpth.>
Used with permission by KnoxTNToday.com

In this case my instincts let me down.  I had drilled in my head for so long "don't use floating point in kernel space!" that I didn't even think to look for an x86-only guard around the DCN code.  I hope the patches make up for it!  ;)

I also have a Navi card coming to play with and help make sure things keep working on the POWER systems in the future.

38
User Zone / Re: Graphics Card install
« on: December 05, 2019, 04:12:35 pm »
I wrote to AMD support, I told them everything, let's see what they answer ... I also said that the drivers currently have problems and that they need to be reviewed, I also said if it would be possible to receive direct support for Power from them, since the OperPower community they almost only buy their cards ... It will be difficult but I try

Thanks partly to meklort's initial tracing of the issue, I was able to put together an initial PoC/RFC patch here:
https://lists.freedesktop.org/archives/amd-gfx/2019-December/043611.html

I don't have a Navi card to test with, so if you are able to apply that patch, recompile, and test, I'd appreciate it.

39
User Zone / Re: Graphics Card install
« on: December 05, 2019, 12:09:22 am »
This issue is due to Navi display support only being enabled for X86.
See here: https://github.com/torvalds/linux/blob/v5.4/drivers/gpu/drm/amd/display/Kconfig#L23
And here: https://github.com/torvalds/linux/blob/v5.4/drivers/gpu/drm/amd/display/dc/calcs/Makefile#L27
And here: https://github.com/torvalds/linux/blob/v5.4/drivers/gpu/drm/amd/display/dc/calcs/dcn_calcs.c#L629

Basically, before this will work the following needs to happen:
Either the DC code needs to be modified to use integer match instead of floating point math or
  • The kernel_fpu_begin / kernel_fpu_end APIs need to be added for POWER (currently only supported on x86 and s390)
  • The KConfig files need to be updated to enable POWER in addition to X86
  • The Makefiles need to be modified to not assume x86 (and sse / sse2)

Good catch.  It's generally considered bad form to use floating point in kernelspace for a number of reasons; this smells a bit more like a move to soft-lock AMD GPUs to AMD CPUs (thankfully one that can be worked around with developer time, but still not a great move).

@mauryg5 I'd return that card if I were you since I'd wager the box didn't mention the drivers only work on x86 (normally having Linux support means the device works anywhere -- I've only ever seen this once before where a device required x86 due to bad quality drivers, and I got a full refund in that case).  Either that or try to get a dev or two interested in fixing up the drivers to be properly compatible, but that might require funding the development or waiting for someone to be personally invested enough to make it happen.

40
User Zone / Re: Graphics Card install
« on: December 04, 2019, 08:42:31 pm »
Code: [Select]
No outputs definitely connected, trying again...
This isn't a POWER problem, this is an AMD GPU driver / hardware problem.  We're going to need a lot more info including the monitor model etc. -- last time I saw this you had to flip DisplayCore on or off, but Navi may require DisplayCore to operate at all.  If the latter is the case, you'll need to contact AMD support to get the driver fixes.
Yes, agreed, it's a problem with the driver (again, I'm assuming it's only a driver issues on ppc64le and not x86_64, but that's only an assumption) as it's not even detecting that there's anything that a monitor could be plugged into.
Note that I did test with amdgpu.dc=1 and amdgpu.dc=0 and there was no change, however my understanding is that navi only works it enabled like you said

Do you know the best way to contact AMD about this? I can run through the appropriate channels to try to get the support improved.

To be honest I'd be very, very surprised if there was a ppc64le specific bug in the AMD display detection.  That code is all straight C, and the bit count and endianness both match x86 exactly.

Probably the first step is for anyone seeing this problem on Navi on POWER to try the exact same kernel version on some old / borrowed x86 system with the same card and monitor.  If that also shows the same problem, at least when reported to the kernel bugtracker the devs won't immediately assume it's a POWER bug. :)

41
User Zone / Re: Graphics Card install
« on: December 03, 2019, 11:28:19 pm »
FYI, I tested this our a month or so ago, with build of the kernel / mesa / etc from git and was never able to get it to to work. I've also just re-tested with Fedora rawhide, and am seeing the same behaviour.

Effectively, the graphics card is detected fine, however no output ports are detected when starting X11, and as a result, no screens are found.
Normally, I'd expect to see something like the following in the X11 log:
Code: [Select]
[   716.370] (II) AMDGPU(0): Output DisplayPort-0 has no monitor section
[   716.370] (II) AMDGPU(0): Output DisplayPort-1 has no monitor section
[   716.370] (II) AMDGPU(0): Output DisplayPort-2 has no monitor section
[   716.371] (II) AMDGPU(0): Output HDMI-A-0 has no monitor section
[   716.404] (II) AMDGPU(0): EDID for output DisplayPort-0

With Navi 10 on rawhide, I instead see the following (no outputs type are even detected, so it doesn't probe them):
Code: [Select]
[  1002.413] (II) AMDGPU(0): glamor X acceleration enabled on AMD NAVI10 (DRM 3.35.0, 5.4.0-2.fc32.ppc64le, LLVM 9.0.0)
[  1002.413] (II) AMDGPU(0): glamor detected, initialising EGL layer.
[  1002.413] (==) AMDGPU(0): TearFree property default: auto
[  1002.413] (==) AMDGPU(0): VariableRefresh: disabled
[  1002.413] (II) AMDGPU(0): KMS Pageflipping: enabled
[  1002.413] (WW) AMDGPU(0): No outputs definitely connected, trying again...
[  1002.413] (WW) AMDGPU(0): Unable to find connected outputs - setting 1024x768 initial framebuffer
[  1002.413] (II) AMDGPU(0): mem size init: gart size :1fe810000 vram size: s:1f7b70000 visible:fd50000
[  1002.413] (==) AMDGPU(0): DPI set to (96, 96)
[  1002.413] (==) AMDGPU(0): Using gamma correction (1.0, 1.0, 1.0)
...
Fatal server error:
[  1002.416] (EE) no screens found(EE)

So, my assumption right now is that the current code has a bug on ppc64 where outputs ports are not detected properly. Note that I'll do some additional tests this weekend, but I expect this will require some sort of fix changes in the kernel/amdgpu driver.

Code: [Select]
No outputs definitely connected, trying again...
This isn't a POWER problem, this is an AMD GPU driver / hardware problem.  We're going to need a lot more info including the monitor model etc. -- last time I saw this you had to flip DisplayCore on or off, but Navi may require DisplayCore to operate at all.  If the latter is the case, you'll need to contact AMD support to get the driver fixes.

42
User Zone / Re: Graphics Card install
« on: December 03, 2019, 01:57:42 pm »
the dmesg I published in the previous posts, if you go backwards in the first or second page, you find my post with the whole report of the dsemg that I did attached ... You guys didn't answer me anymore after I told you that that change on the topic in the line at the start of Linux does not work ... What news do you give me?

Yes, I see that, but I would also need to know if a monitor was plugged in to the card when that dmesg was captured, and also it would be very useful to have the Xorg.0.log file contents from trying to start Xorg while a monitor is plugged in to the AMD GPU.

One of the quirks with AMD GPUs is that they will not start an output they don't think is attached to a monitor (well, not without significant effort, anyway).  That means if your monitor EDID is broken, you'll never get output, for instance.  Log files will help figure out what is wrong, and if you really want to keep using the Navi card despite the various warnings of instability and general brokeness on the Linux driver stack (again, architecture independent -- x86 is just as bad here) I strongly recommend you either obtain SSH access from another computer, or get a null modem serial cable and attach it to another computer.  This is so that you can try various things and get logs without constantly rebooting the machine.

Broken display on Linux has always been a major pain to debug, even on x86 -- I remember spending a long time trying to get nouveau working back in the day on an older x86 box without SSH access; I eventually gave up and got SSH through a laptop IIRC because it was nearly impossible to fix when you have no working display. :)

43
User Zone / Re: Graphics Card install
« on: December 03, 2019, 01:30:48 pm »
Anyone know if Navi is still affected by the (in)famous amdgpu.dc=0 / amdgpu.dc=1 bug?  The newest card I have is Vega (specifically to avoid these kind of issues) so I don't have hands-on experience with Navi.

@MauryG5, if you're able to get an Xorg.0.log and a dmesg from the system when it fails to start the display that would be quite helpful.  Even if you have to get the dmesg and log from the machine while the on-board HDMI is enabled at least it might provide some clue.

44
User Zone / Re: Graphics Card install
« on: December 01, 2019, 03:13:26 pm »
DMA issues (assuming you need the 32-bit DMA thing) are a completely unrelated thing and will affect very few people.
They're also fixed in 5.4 official.  I'm running one of my desktops with 5.4.0 and a Vega 64, no problems and no DMA limitations.  Note this has to be 5.4.0 or higher; the 5.4-rc series did have a bug leading to GPU crash / EEH.  5.3.x and below didn't enable the "64-bit DMA" (while GPU DMA is actually 40 or 48 bit for the most part, the kernel was limiting it to 32-bit) -- this worked well enough but did cause issues if lots of windows were open at once.

Also, RX 5700 cards are extremely problematic on Linux right now (even on x86). To get anywhere near having a remotely stable experience, you need a patched kernel 5.4, LLVM 10 from svn, and mesa from git built against this LLVM, any other configuration will result in frequent hangs (https://gitlab.freedesktop.org/drm/amd/issues/892)

Honestly, given the state of the software stack, I'd recommend the OP get an older card that will just work until the problems are sorted out.  Having personally lived through unstable AMD GPU drivers, it's just not worth it -- at the time it was the Polaris/Vega that was unstable, and I eventually fell back to a secondhand RX290 until the drivers stabilized.  It's just not worth fighting these kinds of problems -- you do eventually get (short term) data loss when the GPU freezes before you can save whatever you were working on, it's just a matter of time -- again, IME, just not worth it.

Let AMD cook the drivers for a few more kernel revisions before trying it again. ;)

45
User Zone / Re: Graphics Card install
« on: November 30, 2019, 04:02:39 pm »
In situations like this it can also be helpful to post dmesg output from the failed driver load, if you can get it.  Most Linux drivers are pretty clear about what firmware file(s) they are missing for the installed hardware, and that can help people here figure out the proper instructions.

Pages: 1 2 [3] 4