Author Topic: New Kernel 5.16 and new problem  (Read 9184 times)

sharkcz

  • Newbie
  • *
  • Posts: 19
  • Karma: +3/-0
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #15 on: March 23, 2022, 12:46:56 pm »
Seeing if @sharkcz knows anything about it.
hmm, says me nothing, but is it reproducible? I believe a standard kernel shouldn't have problems executing VMX instructions ...

MauryG5

  • Hero Member
  • *****
  • Posts: 729
  • Karma: +22/-1
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #16 on: March 23, 2022, 04:18:40 pm »
But in fact it seems strange to me too that suddenly there is a conflict between altivec and the new kernels, because it would mean that we at Power have the problem specifically ... I don't know what is happening on X86 or Arm in relation to this fact...

matgraf

  • Newbie
  • *
  • Posts: 13
  • Karma: +2/-0
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #17 on: March 23, 2022, 04:28:13 pm »
stock debian sid kernel:

Code: [Select]
[    2.714570] Unrecoverable VMX/Altivec Unavailable Exception f20 at c008000002944d5c
[    2.714596] Oops: Unrecoverable VMX/Altivec Unavailable Exception, sig: 6 [#1]
[    2.714612] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
[    2.714637] Modules linked in: amdgpu(E+) gpu_sched(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) xhci_pci(E) sd_mod(E) drm_kms_helper(E) xhci_hcd(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) tg3(E) nvme(E) libphy(E) usbcore(E) nvme_core(E) ptp(E) drm(E) t10_pi(E) ahci(E) pps_core(E) crc_t10dif(E) crct10dif_generic(E) usb_common(E) crct10dif_common(E) libahci(E) drm_panel_orientation_quirks(E)
[    2.714777] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G            E     5.16.0-5-powerpc64le #1  Debian 5.16.14-1
[    2.714821] Workqueue: events work_for_cpu_fn
[    2.714846] NIP:  c008000002944d5c LR: c008000002945c6c CTR: 0000000000000000
[    2.714871] REGS: c000000002f53290 TRAP: 0f20   Tainted: G            E      (5.16.0-5-powerpc64le Debian 5.16.14-1)
[    2.714919] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 84002220  XER: 00000004
[    2.714953] CFAR: c008000002945c68 IRQMASK: 0
[    2.714953] GPR00: c008000002945c6c c000000002f53530 c008000002c78000 0000000000000000
[    2.714953] GPR04: c000000034450000 c000000022eda000 c000000022eda800 c0a4ed2222ed30c0
[    2.714953] GPR08: c0000003fc2a2328 0000000000000000 0000000000000000 c000000022eda400
[    2.714953] GPR12: c00000000048a380 c0000000028b0000 c000000000176ac8 c00000000cfe5e80
[    2.714953] GPR16: c00000000cfe5e88 c00000000cfe0000 c00000000cff6820 c00000000cfe5e98
[    2.714953] GPR20: c00000000cfe5ea0 c00000000cfe5e90 0000000000000100 0000000000000001
[    2.714953] GPR24: 0000000000000001 0000000000000001 c008000002c86a60 c00000000cff0000
[    2.714953] GPR28: c000000034460000 c000000002f53790 c000000034450000 c000000022eda000
[    2.715175] NIP [c008000002944d5c] dcn20_resource_construct+0x44/0xef0 [amdgpu]
[    2.715438] LR [c008000002945c6c] dcn20_create_resource_pool+0x64/0x100 [amdgpu]
[    2.715724] Call Trace:
[    2.715741] [c000000002f53530] [c008000002945c50] dcn20_create_resource_pool+0x48/0x100 [amdgpu] (unreliable)
[    2.716014] [c000000002f535b0] [c008000002a6f540] dc_create_resource_pool+0x2f8/0x3a0 [amdgpu]
[    2.716274] [c000000002f535e0] [c008000002a60364] dc_create+0x1cc/0x650 [amdgpu]
[    2.716515] [c000000002f53690] [c0080000028ca584] amdgpu_dm_init.isra.0+0x1ec/0x1df0 [amdgpu]
[    2.716780] [c000000002f538f0] [c0080000028cc1b0] dm_hw_init+0x28/0x60 [amdgpu]
[    2.717045] [c000000002f53920] [c008000002619c78] amdgpu_device_init+0x1c00/0x2190 [amdgpu]
[    2.717270] [c000000002f53a70] [c00800000261b4f0] amdgpu_driver_load_kms+0x48/0x370 [amdgpu]
[    2.717513] [c000000002f53af0] [c008000002610ef4] amdgpu_pci_probe+0x2dc/0x4c0 [amdgpu]
[    2.717723] [c000000002f53bc0] [c00000000081c598] local_pci_probe+0x68/0x110
[    2.717750] [c000000002f53c40] [c0000000001644f8] work_for_cpu_fn+0x38/0x60
[    2.717776] [c000000002f53c70] [c00000000016a2ec] process_one_work+0x2ac/0x5a0
[    2.717804] [c000000002f53d10] [c00000000016aed0] worker_thread+0x2a0/0x610
[    2.717840] [c000000002f53da0] [c000000000176c78] kthread+0x1b8/0x1c0
[    2.717865] [c000000002f53e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
[    2.717902] Instruction dump:
[    2.717921] fb61ffd8 fbc1fff0 fbe1fff8 fa21ff88 fa81ffa0 faa1ffa8 fac1ffb0 fae1ffb8
[    2.717964] fb01ffc0 fb21ffc8 fb81ffe0 fba1ffe8 <100004c4> 3920ff90 3940ff80 7c9f2378
[    2.718007] ---[ end trace 30cf29bfebd0290d ]---

Will happily provide more info, if needed.

SiteAdmin

  • Administrator
  • *****
  • Posts: 41
  • Karma: +15/-0
  • RCS Staff
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #18 on: March 23, 2022, 05:25:00 pm »

MauryG5

  • Hero Member
  • *****
  • Posts: 729
  • Karma: +22/-1
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #19 on: March 23, 2022, 05:35:47 pm »
Confirmed then apparently, the problem is ours on Power unfortunately ...

matgraf

  • Newbie
  • *
  • Posts: 13
  • Karma: +2/-0
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #20 on: March 24, 2022, 01:51:26 am »
Thanks a lot for the bug report!
Looking forward to test a patch which will hopefully resolve the regression.

MPC7500

  • Hero Member
  • *****
  • Posts: 572
  • Karma: +40/-1
    • View Profile
    • Twitter
Re: New Kernel 5.16 and new problem
« Reply #21 on: March 24, 2022, 12:55:36 pm »
This is exactly the error message I remember and this has nothing to do with amdgpu.
Code: [Select]
Unrecoverable VMX/Altivec Unavailable Exception

I have to correct myself, AST is working on 5.16.x

MauryG5

  • Hero Member
  • *****
  • Posts: 729
  • Karma: +22/-1
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #22 on: March 24, 2022, 03:18:31 pm »
sorry guys, speaking of kernels, on Debian I was trying a different procedure to compile the Kernel. Until now I have always used the make menuconfig commands to configure and then I did make -j32, make modules, make modules_install and make install and he compiled and installed modules and kernels and automatically updated the Grub. The only flaw of this procedure is that if you query the system asking for all the installed kernels, it never shows you the one installed with this procedure and consequently you cannot remove it with the classic purge command but you have to manually remove it file by file and in each case the system does not recognize it as an officially installed kernel, even though it works fine in the end. Using instead the command: sudo make -j 32 KDEB_PKGVERSION = 1.-maury.ppc64le deb-pkg, then in effect create all the packages with extension .deb and then with dpkg -i install them and you have the kernel recognized by the system as well. Except that while on Ubuntu I managed to get it to work, on Debian it always gives me this error: make [2]: *** [debian / rules: 7: build-arch] Error 2
dpkg-buildpackage: error: debian / rules binary subprocess returned exit status 2
make [1]: *** [scripts / Makefile.package: 77: deb-pkg] Error 2
make: *** [Makefile: 1576: deb-pkg] Error 2

I did some research on Google, I saw that they had some libraries installed but nothing had no effect, they say to configure the CONFIG TRUSTED KEY but I have always done it by deleting all the text and it has always worked with the classic make -j32 but nothing here, it always keeps giving me that damn mistake ... I don't understand why it doesn't want to work on Debian ... Can you tell me what's missing or where the error is? Thank you

SiteAdmin

  • Administrator
  • *****
  • Posts: 41
  • Karma: +15/-0
  • RCS Staff
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #23 on: March 24, 2022, 05:36:59 pm »
Thanks a lot for the bug report!
Looking forward to test a patch which will hopefully resolve the regression.

They'd like a bisect on that bug report, any chance you could do that since you have a reproducible issue on your specific hardware?

matgraf

  • Newbie
  • *
  • Posts: 13
  • Karma: +2/-0
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #24 on: March 25, 2022, 05:36:45 pm »
They'd like a bisect on that bug report, any chance you could do that since you have a reproducible issue on your specific hardware?

I will try. Never bisected before. Well, then git bisect start ...

matgraf

  • Newbie
  • *
  • Posts: 13
  • Karma: +2/-0
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #25 on: March 26, 2022, 01:18:43 pm »
bisect of the error:

Code: [Select]
9fa0fb77132fe9e83f2b357fd5a2b16293a5b9ee is the first bad commit
commit 9fa0fb77132fe9e83f2b357fd5a2b16293a5b9ee
Author: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
Date:   Tue Jan 26 15:15:33 2021 -0500

    drm/amd/display: USB4 DPIA enumeration and AUX Tunneling
   
    [WHY]
    To enable dc links for USB4 DPIA ports and AUX command tunneling
    for YELLOW_CARP_B0.
   
    [HOW]
    1) Created dc links for all USB4 DPIA ports in create_links().
       dc_link_construct() implementation is split for legacy DDC and DPIAs.
       As usb4 has no ddc, ddc->ddc_pin will be set to NULL for its dc link
       and this parameter will be used to identify the dc links as DPIA. The
       dc link for DPIA is further to be enhanced with implementation for link
       encoder and link initialization.
    2) usb4_dpia_count in struct resource_pool will be initialized to 4 in
       dcn31_resource_construct() if the DCN is YELLOW_CARP_B0.
    3) Enabled DMUB AUX via outbox for YELLOW_CARP_B0.
   
    Reviewed-by: Jimmy Kizito <Jimmy.Kizito@amd.com>
    Acked-by: Wayne Lin <Wayne.Lin@amd.com>
    Acked-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
    Acked-by: Harry Wentland <harry.wentland@amd.com>
    Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

 drivers/gpu/drm/amd/display/dc/core/dc.c           | 32 +++++++++-
 drivers/gpu/drm/amd/display/dc/core/dc_link.c      | 71 +++++++++++++++++++++-
 drivers/gpu/drm/amd/display/dc/core/dc_link_ddc.c  |  3 +-
 drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hwseq.c |  6 ++
 .../gpu/drm/amd/display/dc/dcn31/dcn31_resource.c  |  6 ++
 drivers/gpu/drm/amd/display/dc/inc/core_types.h    |  1 +
 drivers/gpu/drm/amd/display/dc/inc/dc_link_ddc.h   |  1 +
 drivers/gpu/drm/amd/display/dc/irq_types.h         |  5 +-
 8 files changed, 120 insertions(+), 5 deletions(-)

I am pleased to do more.

MPC7500

  • Hero Member
  • *****
  • Posts: 572
  • Karma: +40/-1
    • View Profile
    • Twitter
Re: New Kernel 5.16 and new problem
« Reply #26 on: March 26, 2022, 01:59:42 pm »
Does the Kernel work then?

SiteAdmin

  • Administrator
  • *****
  • Posts: 41
  • Karma: +15/-0
  • RCS Staff
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #27 on: March 26, 2022, 02:04:36 pm »
@matgraf Awesome, thank you for that!  Let's see if upstream can work out how to fix it from there.

MauryG5

  • Hero Member
  • *****
  • Posts: 729
  • Karma: +22/-1
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #28 on: March 28, 2022, 03:37:24 pm »
Hey guys where are we with the Kernel fixes? I am ready for testing as soon as you make it available ...

MauryG5

  • Hero Member
  • *****
  • Posts: 729
  • Karma: +22/-1
    • View Profile
Re: New Kernel 5.16 and new problem
« Reply #29 on: April 13, 2022, 01:07:25 pm »
Guys apparently with the Kernel we have stopped, no news yet on the correction of the problem on Power of the new Kernel 5.16 and 5.17 and the 5.18 is coming as well ...