Author Topic: New Kernel 5.16 and new problem  (Read 20906 times)

matgraf

  • Newbie
  • *
  • Posts: 13
  • Karma: +2/-0
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #30 on: April 17, 2022, 08:22:11 pm »
@SiteAdmin Was able to pinpoint the error to a single line of code!
The last working commit is eabf2019b7e5bf8216e373a74e08f13ca6b6c550.
If I apply of the next commit 9fa0fb77132fe9e83f2b357fd5a2b16293a5b9ee only the part
Code: [Select]
diff --git a/drivers/gpu/drm/amd/display/dc/inc/dc_link_ddc.h b/drivers/gpu/drm/amd/display/dc/inc/dc_link_ddc.h
index 4d7b271b6409..95fb61d62778 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/dc_link_ddc.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/dc_link_ddc.h
@@ -69,6 +69,7 @@ struct ddc_service_init_data {
        struct graphics_object_id id;
        struct dc_context *ctx;
        struct dc_link *link;
+       bool is_dpia_link;
 };
I get the following error:
Code: [Select]
[    1.813696] Unrecoverable VMX/Altivec Unavailable Exception f20 at c008000002933e0c
[    1.813720] Oops: Unrecoverable VMX/Altivec Unavailable Exception, sig: 6 [#1]
[    1.813731] LE PAGE_SIZE=4K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[    1.813742] Modules linked in: sd_mod amdgpu(+) gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect xhci_pci sysimgblt xhci_hcd fb_sys_fops nvme drm tg3 nvme_core usbcore crc32c_vpmsum t10_pi libphy crc_t10dif crct10dif_generic ahci ptp crct10dif_vpmsum pps_core libahci crct10dif_common drm_panel_orientation_quirks
[    1.813821] CPU: 0 PID: 237 Comm: kworker/0:3 Not tainted 5.15.0-rc2+ #1
[    1.813832] Workqueue: events work_for_cpu_fn
[    1.813862] NIP:  c008000002933e0c LR: c008000002934d3c CTR: 0000000000000000
[    1.813882] REGS: c0000000057c3250 TRAP: 0f20   Not tainted  (5.15.0-rc2+)
[    1.813901] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 84002240  XER: 00000021
[    1.813926] CFAR: c008000002934d38 IRQMASK: 0
[    1.813926] GPR00: c008000002934d3c c0000000057c34f0 c008000002c4b000 0000000000000000
[    1.813926] GPR04: c000000003e80000 c0000000058b5800 c0000000058b6000 c05c8b05058b18c0
[    1.813926] GPR08: c0000003ff3a4328 0000000000000000 005c8b05000000c0 78926402dcab8839
[    1.813926] GPR12: c00000000046dbf0 c000000001763000 c0000000038a0000 c0000000038a6120
[    1.813926] GPR16: c0000000038a6128 c0000000038a6118 c0000000038b6a88 c0000000038a6138
[    1.813926] GPR20: c0000000038a6140 c0000000038a6130 0000000000000100 0000000000000001
[    1.813926] GPR24: 0000000000000001 0000000000000001 0000000000000001 c000000034647bc0
[    1.813926] GPR28: c000000003e90000 c0000000057c37c8 c000000003e80000 c0000000058b5800
[    1.814077] NIP [c008000002933e0c] dcn20_resource_construct+0x44/0xf10 [amdgpu]
[    1.814269] LR [c008000002934d3c] dcn20_create_resource_pool+0x64/0x100 [amdgpu]
[    1.814437] Call Trace:
[    1.814452] [c0000000057c34f0] [c008000002934d20] dcn20_create_resource_pool+0x48/0x100 [amdgpu] (unreliable)
[    1.814645] [c0000000057c3570] [c008000002a5b520] dc_create_resource_pool+0x2f8/0x3a0 [amdgpu]
[    1.814827] [c0000000057c35a0] [c008000002a4cec8] dc_create+0x1d0/0xa80 [amdgpu]
[    1.815000] [c0000000057c36c0] [c0080000028b9ca4] amdgpu_dm_init.isra.0+0x1dc/0x1cb0 [amdgpu]
[    1.815177] [c0000000057c3920] [c0080000028bb7a0] dm_hw_init+0x28/0x60 [amdgpu]
[    1.815357] [c0000000057c3950] [c00800000260ab74] amdgpu_device_init+0x1d9c/0x21a0 [amdgpu]
[    1.815506] [c0000000057c3aa0] [c00800000260c200] amdgpu_driver_load_kms+0x48/0x370 [amdgpu]
[    1.815652] [c0000000057c3b20] [c008000002601c7c] amdgpu_pci_probe+0x1d4/0x370 [amdgpu]
[    1.815786] [c0000000057c3bc0] [c0000000007f6778] local_pci_probe+0x68/0x110
[    1.815819] [c0000000057c3c40] [c000000000158198] work_for_cpu_fn+0x38/0x60
[    1.815842] [c0000000057c3c70] [c00000000015df68] process_one_work+0x2a8/0x590
[    1.815866] [c0000000057c3d10] [c00000000015eb50] worker_thread+0x2a0/0x610
[    1.815899] [c0000000057c3da0] [c00000000016a7d4] kthread+0x184/0x190
[    1.815931] [c0000000057c3e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
[    1.815962] Instruction dump:
[    1.815978] fb61ffd8 fbc1fff0 fbe1fff8 fa81ffa0 faa1ffa8 fac1ffb0 fae1ffb8 fb01ffc0
[    1.816023] fb21ffc8 fb41ffd0 fb81ffe0 fba1ffe8 <100004c4> 3920ff90 3940ff80 7c9f2378
[    1.816058] ---[ end trace b28b055b72a2e4fc ]---

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 473
  • Karma: +37/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: New Kernel 5.16 and new problem
« Reply #31 on: April 17, 2022, 11:07:05 pm »
That seems a little odd. Literally just adding that bool to that struct makes it go crazy?

rheaplex

  • Newbie
  • *
  • Posts: 26
  • Karma: +2/-0
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #32 on: April 18, 2022, 12:28:55 am »
If the struct is allocated elsewhere with the old layout, or if the firmware is expecting it, that would explain it.

matgraf

  • Newbie
  • *
  • Posts: 13
  • Karma: +2/-0
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #33 on: April 18, 2022, 04:07:39 am »
The similar addition in the other file core_types.h does not trigger the error. This means that the following part of the problematic commit 9fa0fb77132fe9e83f2b357fd5a2b16293a5b9ee does not trigger the error:
Code: [Select]
diff --git a/drivers/gpu/drm/amd/display/dc/inc/core_types.h b/drivers/gpu/drm/amd/display/dc/inc/core_types.h
index ed09af238911..6fc6488c54c0 100644
--- a/drivers/gpu/drm/amd/display/dc/inc/core_types.h
+++ b/drivers/gpu/drm/amd/display/dc/inc/core_types.h
@@ -62,6 +62,7 @@ struct link_init_data {
        uint32_t connector_index; /* this will be mapped to the HPD pins */
        uint32_t link_index; /* this is mapped to DAL display_index
                                TODO: remove it when DC is complete. */
+       bool is_dpia_link;
 };

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 473
  • Karma: +37/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: New Kernel 5.16 and new problem
« Reply #34 on: April 18, 2022, 05:05:17 pm »
Maybe the problem is a mismatch between the struct definitions if it (or an equivalent object) is in different locations.

matgraf

  • Newbie
  • *
  • Posts: 13
  • Karma: +2/-0
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #35 on: April 19, 2022, 10:48:30 am »
Was able to fix the bug. Now I have a working kernel 5.17.3 !

Code: [Select]
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index 2a72517e2b28..1f83c7331b06 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -3721,7 +3721,7 @@ static bool dcn20_resource_construct(
        int i;
        struct dc_context *ctx = dc->ctx;
        struct irq_service_init_data init_data;
-       struct ddc_service_init_data ddc_init_data = {0};
+       struct ddc_service_init_data ddc_init_data;
        struct _vcs_dpi_soc_bounding_box_st *loaded_bb =
                        get_asic_rev_soc_bb(ctx->asic_id.hw_internal_rev);
        struct _vcs_dpi_ip_params_st *loaded_ip =
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
index 8ca26383b568..f93b944f75fa 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c
@@ -2559,7 +2559,7 @@ static bool dcn30_resource_construct(
        int i;
        struct dc_context *ctx = dc->ctx;
        struct irq_service_init_data init_data;
-       struct ddc_service_init_data ddc_init_data = {0};
+       struct ddc_service_init_data ddc_init_data;
        uint32_t pipe_fuses = read_pipe_fuses(ctx);
        uint32_t num_pipes = 0;

MauryG5

  • Hero Member
  • *****
  • Posts: 774
  • Karma: +22/-1
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #36 on: April 19, 2022, 01:11:23 pm »
Great Matgraf, great job, how can we take advantage of the correction now? Should we simply wait for the Linux Kernel team to fix it and then regularly download the tarball from the linux-archive site or do we need to do something else? Thanks

matgraf

  • Newbie
  • *
  • Posts: 13
  • Karma: +2/-0
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #37 on: April 19, 2022, 03:33:29 pm »
@MauryG5 Download a kernel source tree of choice and unpack it. Then download my patch and save it inside the top directory of the kernel source. Now apply the patch as follows from inside of the kernel source directory:
Code: [Select]
patch -p1 < fpu_exeption_fix_for_ppc64le_with_amdgpu.patchNow configure and compile as usual.

Warning: I don't know what my patch exactly does. It was mere guess work. I found the solution by looking at resembling code in the file drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c applying it to the other two files and the error was gone. I have no clue if my patch has any implications whatsoever. Use at your own risk.

MauryG5

  • Hero Member
  • *****
  • Posts: 774
  • Karma: +22/-1
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #38 on: April 19, 2022, 04:06:12 pm »
Excuse me but when you make a correction, shouldn't you tell the Kernel developers, warning them that a correction must be made regarding that architecture and therefore to make the Kernel work, you must make a correction for everyone?

matgraf

  • Newbie
  • *
  • Posts: 13
  • Karma: +2/-0
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #39 on: April 19, 2022, 05:03:26 pm »

MauryG5

  • Hero Member
  • *****
  • Posts: 774
  • Karma: +22/-1
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #40 on: April 20, 2022, 01:10:07 am »
Well then in theory maybe already in the next Kernel 5.17.4, there could be the correction without the need to insert it manually ...

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 473
  • Karma: +37/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: New Kernel 5.16 and new problem
« Reply #41 on: April 20, 2022, 11:44:59 am »
I agree with the comments on that thread that skipping initialization is probably the wrong approach. The fact it works is likely a weird side effect. But great work on localizing where the problem is and I don't think it will take them long to find a better solution.

MauryG5

  • Hero Member
  • *****
  • Posts: 774
  • Karma: +22/-1
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #42 on: April 21, 2022, 01:46:23 pm »
Hi MatGraf, unfortunately, having never patched the kernel, I'm not practical and I don't understand where to apply this patch exactly. I read in this guide on the net, which must be applied in the path of / usr / src but in the downloaded kernel there is only the usr directory, the src is not there and I don't understand where to apply this patch. That path is present in the root system directory, that is the one that is identified with the symbol / but in the directory of the Kernel just downloaded I can't find this path ... Can you tell me better how to do it kindly? So I too start learning how to patch kernels ... Thanks

MauryG5

  • Hero Member
  • *****
  • Posts: 774
  • Karma: +22/-1
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #43 on: April 22, 2022, 03:28:30 pm »
No matgraf, as not mentioned, I only understood afterwards what you meant in your explanation and how to insert the patch. Just up and running in version 5.17.4, your patch apparently works very well, congratulations. I'm testing it on Ubuntu, then compiling it for Debian. Thanks

matgraf

  • Newbie
  • *
  • Posts: 13
  • Karma: +2/-0
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #44 on: April 22, 2022, 04:03:11 pm »
Hi MauryG5, I am glad you got it running!
Using debian as well and compiled my latest kernel as follows:
Code: [Select]
cd
mkdir mylinux
cd mylinux
wget https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.17.4.tar.xz
tar xf linux-5.17.4.tar.xz
wget http://ftp.de.debian.org/debian/pool/main/l/linux/linux-config-5.17_5.17.3-1_ppc64el.deb
dpkg-deb -x linux-config-5.17_5.17.3-1_ppc64el.deb ./
cp usr/src/linux-config-5.17/config.ppc64el_none_powerpc64le.xz ./
xz -d config.ppc64el_none_powerpc64le.xz
cd linux-5.17.4
curl -o fpu_exeption_fix_for_ppc64le_with_amdgpu.patch 'https://forums.raptorcs.com/index.php?action=dlattach;topic=332.0;attach=388'
patch -p1 < fpu_exeption_fix_for_ppc64le_with_amdgpu.patch
cp ../config.ppc64el_none_powerpc64le .config
echo CONFIG_PPC_4K_PAGES=y >> .config
make olddefconfig
make -j32 bindeb-pkg
sudo dpkg -i ../linux-headers-5.17.4_5.17.4-1_ppc64el.deb ../linux-image-5.17.4_5.17.4-1_ppc64el.deb
sudo reboot
« Last Edit: April 22, 2022, 04:05:16 pm by matgraf »