Software > User Zone

GPU DMA issue diagnoses and impact (missing DMA kernel support beyond 32 bits)

(1/2) > >>

FlyingBlackbird:
The Raptor wiki mentions


--- Quote ---All AMD GPUs currently have DMA issues (limited to 32-bit, which can cause crashes) due to missing Linux kernel support for DMA windows between 33 and 63 bits in length.
The root cause is GPU vendors (and occasionally some non-GPU vendors) cutting costs and only including 40-bit capable (Intel-style) DMA controllers.
A compatibility mode is expected to be included in Linux 5.4 and above that will resolve this issue

--- End quote ---

https://wiki.raptorcs.com/wiki/POWER9_Hardware_Compatibility_List/PCIe_Devices#Graphics_Cards

What I would like to understand:

* How can I diagnose this (am I affected)?
* What is the impact of this issue (crashes under which conditions)?

madscientist159:
This is no longer an issue with kernel 5.4 and higher.

FlyingBlackbird:

--- Quote from: madscientist159 on February 01, 2020, 05:04:24 pm ---This is no longer an issue with kernel 5.4 and higher.

--- End quote ---

Yes, I have read this but I would like to understand the background to make decent decisions on future hardware purchases ("compatibility mode" sounds like "performance impact")

MPC7500:
I'm not an expert on this topic (If one of my statements is wrong, please correct it ;)), but as far as I understand, the system crashes when the graphics card uses more than 4GiB (32bit). For example when you open a ton of Firefox windows. 40bit is 1TiB by the way, not 128GiB.

And even if a user would be affected by this bug, he would not notice it.

madscientist159:

--- Quote from: FlyingBlackbird on February 02, 2020, 04:05:06 am ---Yes, I have read this but I would like to understand the background to make decent decisions on future hardware purchases ("compatibility mode" sounds like "performance impact")

--- End quote ---

No, it's much the same as "32-bit compatibility mode" in the x86 software context.  The GPUs are broken, not quite spec compliant, so IBM had to introduce additional kernel handling to work around that.  Therefore, "compatibility with old broken GPU hardware", or "compatibility mode"  ;)

If they were not broken, the extra "compatibility" code would not be needed.  You can of course run them in 32-bit mode, but when you exhaust the 32-bit space you'll get GPU memory allocation errors and that very well might crash X or your applications (or the machine, since GPU drivers don't generally handle errors well).

Navigation

[0] Message Index

[#] Next page

Go to full version