Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - pocock

Pages: [1] 2 3 ... 20
1
General Discussion / Re: HP printers and the hplip plugin
« on: July 05, 2024, 02:50:40 pm »
As I mentioned earlier, the M479fdn did seem to work partially with single-sided ADF scanning using Debian bullseye but it stopped working with Debian bookworm.  Maybe it is a regression in hplip or they shifted more logic from hplip to their closed-source plugin.

I tried another printer now, the HP M227sdn, same problems trying to use the ADF, it did not work at all.

2
Andreas Tille - Can it be inferred that his intentions with his time as project leader are malicious? What kind of negative changes, impacting Debian security, would one anticipate being made during this time?

I do not know how much comes from his own personal intentions and how much he is manipulated to behave rudely to my family and I.

When people see the negative attacks on my family, they do not want to join Debian or associate with Debian people

Some people already decided to leave.  You can read some of the historic resignations here.  There have been more.

The FSFE also had a lot of resignations, nobody was ever expelled.  Some of them wrote their reasons publicly.

Therefore, the composition of the Debian organization after 12 months of the current leader will be a reflection of his leadership behavior.

Imagine if you go on holiday and you arrive in the destination and you find the people in that place are eating somebody, a cannibal feast.  If you are a cannibal too you might stay in that village for your holiday.  But if you are not a cannibal you will probably change your vacation plans and go somewhere else.  If they eat somebody every weekend in that village then sooner or later anybody who is not a cannibal has probably been eaten or moved elsewhere and the only people left are cannibals.

Having less skilled people in Debian will mean less eyes on security problems, slower responses to security alerts and various other consequences.

(unrelated, we don't know if the German cannibal case is connected or not)

3
General Discussion / Re: HP printers and the hplip plugin
« on: July 04, 2024, 10:16:02 am »

It depends on the version of hplip and the printer model

E.g. for me, using Debian bullseye / hplip 3.22.6+dfsg0-1 the printer in question was working with the ADF for one-sided scanning.  Duplex scanning was not working.  It is M479fdn.

After upgrading to Debian bookworm / hplip 3.22.10+dfsg0-2 the ADF does not work at all

I have an x86 system running Debian bookworm connected to the same printer.  If the HP plugin is installed on the x86 system then the ADF works and it works for both sides in a single pass, full duplex scanning.

It seems that I can print without the hplip plugin, the problems only arise when scanning with the ADF

I have two other HP printers here that I can try.

4
General Discussion / HP printers and the hplip plugin
« on: July 04, 2024, 04:32:20 am »
Some HP printers need a binary plugin

The hplip (open source package) tool can download the plugin directly from HP

The plugin is an x86 binary

At some point HP added support for ARM

In some cases, the printer and flatbed scanner will work fine without the plugin but to use more advanced features, e.g. the ADF duplex mode, you need to have the plugin.

Does anybody have any experience with this issue, for example:

- is there any printer/scanner/ADF setup that gives a completely or as much as possible free experience?

- can the plugin be executed with an emulator?

- did anybody already raise a support request with HP for a ppc64el binary of the plugin?

5

The Talos II and Blackbird have been marketed as a platform for security-minded users and many people have purchased the platform with that in mind.

Security is only as good as the weakest link in the chain.  It is no good having the most secure hardware if there are regular defects in the OS or web browser or some other level in the stack.

I've recently started blogging about Debian's handling of security issues.

This is not a new concern: in 2008, it was the OpenSSL random number generator and some people still have vulnerable keys in use today, 16 years later.

The new revelation is that in March 2000, Edward Brocklesby took over the SSH2 package and uploaded new binaries into Debian

Six weeks later and in April 2000 Brocklesby was secretly expelled for hacking

The Debian Social Contract, point 3 tells us "we won't hide problems".  I felt the social contract compelled me to bring this SSH2 affair into the public domain at the beginning of June 2024.  Andreas Tille has made four more "Statement on Daniel Pocock" insult responses in barely four weeks, two of them on web sites and two by spam emails.  Somebody commented that Debian never had such a big hissy fit.

Nonetheless, these hissy fits reveal a lot about the culture.  I made a chronological review of the culture so people can see it is not about me, the series of suicides and other deaths, with evidence, suggest it is about the mindset of the group.  For people who have to answer everything with a new "Statement on Daniel Pocock", what we see is that being stubborn is more important than being secure.

The Brocklesby affair may be 24 years ago but it actually reveals a continuity.  We can measure subsequent security incidents against the Brocklesby affair and see that each time Debian is tested, the responses are lackluster.

6
Operating Systems and Porting / Re: Debian 12 status?
« on: July 03, 2024, 11:16:52 am »

I took the 6.7.12-1 backport for bookworm and applied the 4k page size patch.

It has the same problem with amdgpu that I saw using the 6.1 kernel with 4k patch.  Errors about "Not enough memory for command submission!" and eventually Firefox and the whole desktop freeze.

I pushed each of my branches for anybody else who wants to try it.  Maybe it is fine with other GPUs

This is the 4k patch against the 6.7.12-1 kernel on the branch for unstable and testing

The same patch backported for the 6.7.12-1 kernel on bookworm-backports

The 6.1 standard kernel branch for bookworm with the 4k page size patch

There are various other web sites discussing the error message.  Some of them suggest that the amdgpu firmware needs to be updated.

One of them points to a new version of the amdgpu firmware that appeared in the upstream kernel tree in June 2024

At present, the Debian firmware package still has older amdgpu firmware from 2023 - you can see the current versions here, even unstable hasn't been updated since June 2023.

Here is the packaging repository on Debian Git (Salsa), we can't clearly see the actual versions of the amdgpu firmware files


7
Operating Systems and Porting / Re: Debian 12 status?
« on: July 02, 2024, 05:02:25 pm »

Using the Debian 12 (bookworm) kernel 6.1 package with 4k page size I started getting errors in the log and eventually Firefox would freeze and the whole GUI (GNOME desktop) would freeze.  The system is still accessible over SSH.

After the first crash, I started radeontop and noticed up to 80% GPU VRAM utilisation with Firefox running.

I rebooted into the kernel from bullseye (5.10.46-4.1 built from my 4k page size branch), running with the Debian 12 filesystem and it is running fine, no crashes.  I have had that kernel running for months on end on this platform using the bullseye filesystem.

I'm going to build the 6.7.12 kernel backport for bookworm with the 4k page size and try that as well.

journalctl captures a lot of errors like this, sometimes they appear for hours before it eventually crashes.  I was able to repeat the crash a couple of times.

I see it as a good thing that the platform is still responding over SSH even when the GUI has crashed.

Code: [Select]
kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
...

The crash is captured too:

Code: [Select]
kernel: ------------[ cut here ]------------
 kernel: WARNING: CPU: 5 PID: 6577 at drivers/gpu/drm/ttm/ttm_bo.c:357 ttm_bo_release+0x538/0x5b0 [ttm]
 kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge snd_seq_dummy snd_hrtimer snd_seq rfkill qrtr 8021q garp stp mrp llc overlay sunrpc binfmt_misc ext4 crc16 mbcache jbd2 uvcvideo videobuf2_vmalloc videobuf2_memops snd_usb_audio snd_hda_codec_hdmi videobuf2_v4l2 videobuf2_common snd_usbmidi_lib snd_hda_intel snd_rawmidi snd_intel_dspcfg videodev snd_seq_device evdev joydev snd_hda_codec mc snd_hda_core snd_hwdep snd_pcm sg snd_timer snd ofpart soundcore ipmi_powernv powernv_flash ctr ipmi_devintf at24 vmx_crypto mtd regmap_i2c ipmi_msghandler gf128mul opal_prd parport_pc lp parport fuse configfs loop ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress xts ecb uas usb_storage dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic
 kernel:  raid1 raid0 multipath linear md_mod hid_generic usbhid hid sd_mod amdgpu gpu_sched drm_buddy i2c_algo_bit drm_display_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt xhci_pci fb_sys_fops xhci_hcd nvme nvme_core t10_pi tg3 drm mpt3sas crc64_rocksoft_generic usbcore crc64_rocksoft crc_t10dif crct10dif_generic crc64 crct10dif_common drm_panel_orientation_quirks raid_class libphy usb_common scsi_transport_sas
 kernel: CPU: 5 PID: 6577 Comm: Renderer Not tainted 6.1.0-21-powerpc64le-4k #1  Debian 6.1.90-1.1
 kernel: Hardware name: T2P9D01 REV 1.00 POWER9 0x4e1203 opal:skiboot-9858186 PowerNV
 kernel: NIP:  c00800000f4d2120 LR: c008000012fe7270 CTR: c00800000f4d2198
 kernel: REGS: c00020001bc571d0 TRAP: 0700   Not tainted  (6.1.0-21-powerpc64le-4k Debian 6.1.90-1.1)
 kernel: MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 84824244  XER: 20040036
 kernel: CFAR: c00800000f4d1c34 IRQMASK: 0
         GPR00: c008000012fe7270 c00020001bc57470 c00800000f508800 c0000003b100c5c0
         GPR04: 0000000000000000 0000000ffcac0000 0000000000041a8b 0000000000000000
         GPR08: 0000000000041a8a c00020001af87738 0000000000000000 c008000013564c90
         GPR12: c00800000f4d2198 c000000ffffdbf00 c00020001b054b80 c0002000ef821798
         GPR16: 00000000002c6800 0000000000000001 0000000000000000 0000000000000000
         GPR20: 00000000002c6880 c000200015080000 0000000000000071 00000000002c6800
         GPR24: 0000000000000003 c00020001af87010 0000000000000000 000000003ee00000
         GPR28: c0000003b100c458 c000200015080000 c000200015085508 c0000003b100c5c0
 kernel: NIP [c00800000f4d2120] ttm_bo_release+0x538/0x5b0 [ttm]
 kernel: LR [c008000012fe7270] amdgpu_bo_unref+0x38/0x60 [amdgpu]
 kernel: Call Trace:
 kernel: [c00020001bc57470] [c00800000f4d1f18] ttm_bo_release+0x330/0x5b0 [ttm] (unreliable)
 kernel: [c00020001bc57500] [c008000012fe7270] amdgpu_bo_unref+0x38/0x60 [amdgpu]
 kernel: [c00020001bc57530] [c0080000130105fc] amdgpu_vm_ptes_update+0xc24/0xc60 [amdgpu]
 kernel: [c00020001bc576a0] [c00800001300935c] amdgpu_vm_update_range+0x304/0x880 [amdgpu]
 kernel: [c00020001bc577c0] [c008000013009f04] amdgpu_vm_bo_update+0x2ec/0x630 [amdgpu]
 kernel: [c00020001bc578e0] [c008000012ff0bcc] amdgpu_gem_va_ioctl+0x674/0x6b0 [amdgpu]
 kernel: [c00020001bc57a20] [c008000012f07040] drm_ioctl_kernel+0x118/0x230 [drm]
 kernel: [c00020001bc57a80] [c008000012f073b0] drm_ioctl+0x258/0x560 [drm]
 kernel: [c00020001bc57bf0] [c008000012fc00b8] amdgpu_drm_ioctl+0x70/0xd0 [amdgpu]
 kernel: [c00020001bc57c40] [c0000000005493f4] sys_ioctl+0x744/0x1460
 kernel: [c00020001bc57d40] [c00000000002afd8] system_call_exception+0x138/0x260
 kernel: [c00020001bc57e10] [c00000000000c0f0] system_call_vectored_common+0xf0/0x280
 kernel: --- interrupt: 3000 at 0x7fff8eb4433c
 kernel: NIP:  00007fff8eb4433c LR: 00007fff8eb4433c CTR: 0000000000000000
 kernel: REGS: c00020001bc57e80 TRAP: 3000   Not tainted  (6.1.0-21-powerpc64le-4k Debian 6.1.90-1.1)
 kernel: MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 44224840  XER: 00000000
 kernel: IRQMASK: 0
         GPR00: 0000000000000036 00007fff75b9bb20 00007fff8ec56f00 0000000000000053
         GPR04: 00000000c0286448 00007fff75b9bc00 0000000000280000 00000002c5a00000
         GPR08: 0000000000100000 0000000000000000 0000000000000000 0000000000000000
         GPR12: 0000000000000000 00007fff75ba68c0 00007fff75b9c028 00007fff75b9c178
         GPR16: 00007fff75b9c508 0000000000000001 0000000000020000 00007fff75b9cd18
         GPR20: 0000000000000001 0000000000020000 00007fff75b9be80 0000000000ea0000
         GPR24: 0000000000000000 0000000000200000 0000000000000004 0000000000200000
         GPR28: 0000000000000053 00000000c0286448 00007fff75b9bc00 00007ffefa5ca0a0
 kernel: NIP [00007fff8eb4433c] 0x7fff8eb4433c
 kernel: LR [00007fff8eb4433c] 0x7fff8eb4433c
 kernel: --- interrupt: 3000
 kernel: Instruction dump:
 kernel: 4bfffe30 60000000 60000000 60420000 0fe00000 7c0802a6 fb610068 fba10078
 kernel: f80100a0 60000000 60000000 60420000 <0fe00000> 4bffffe0 60000000 60420000
 kernel: ---[ end trace 0000000000000000 ]---
 kernel: [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-12)
 kernel: Kernel attempted to read user page (0) - exploit attempt? (uid: 1000)
 kernel: BUG: Kernel NULL pointer dereference on read at 0x00000000
 kernel: Faulting instruction address: 0xc008000012fe8f30
 kernel: Oops: Kernel access of bad area, sig: 11 [#1]
 kernel: LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
 kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge snd_seq_dummy snd_hrtimer snd_seq rfkill qrtr 8021q garp stp mrp llc overlay sunrpc binfmt_misc ext4 crc16 mbcache jbd2 uvcvideo videobuf2_vmalloc videobuf2_memops snd_usb_audio snd_hda_codec_hdmi videobuf2_v4l2 videobuf2_common snd_usbmidi_lib snd_hda_intel snd_rawmidi snd_intel_dspcfg videodev snd_seq_device evdev joydev snd_hda_codec mc snd_hda_core snd_hwdep snd_pcm sg snd_timer snd ofpart soundcore ipmi_powernv powernv_flash ctr ipmi_devintf at24 vmx_crypto mtd regmap_i2c ipmi_msghandler gf128mul opal_prd parport_pc lp parport fuse configfs loop ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress xts ecb uas usb_storage dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic
 kernel:  raid1 raid0 multipath linear md_mod hid_generic usbhid hid sd_mod amdgpu gpu_sched drm_buddy i2c_algo_bit drm_display_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt xhci_pci fb_sys_fops xhci_hcd nvme nvme_core t10_pi tg3 drm mpt3sas crc64_rocksoft_generic usbcore crc64_rocksoft crc_t10dif crct10dif_generic crc64 crct10dif_common drm_panel_orientation_quirks raid_class libphy usb_common scsi_transport_sas
 kernel: CPU: 4 PID: 6567 Comm: firefox-es:cs0 Tainted: G        W          6.1.0-21-powerpc64le-4k #1  Debian 6.1.90-1.1
 kernel: Hardware name: T2P9D01 REV 1.00 POWER9 0x4e1203 opal:skiboot-9858186 PowerNV
 kernel: NIP:  c008000012fe8f30 LR: c008000013030f54 CTR: c00000000098c350
 kernel: REGS: c000200033e2ae10 TRAP: 0300   Tainted: G        W           (6.1.0-21-powerpc64le-4k Debian 6.1.90-1.1)
 kernel: MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24224244  XER: 200400dd
 kernel: CFAR: c008000013030f50 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0
         GPR00: c008000013030f54 c000200033e2b0b0 c00800001377c600 c000200015085508
         GPR04: c0000003b100c400 0000000000000000 000000003ee00000 0000000000000080
         GPR08: 0000000000001000 0000000000000000 c0000003cce4e800 c008000013565488
         GPR12: c00000000098c350 c000000ffffdcc00 00000000002c6880 00000000002c6880
         GPR16: 0000000000080000 c0000003b100c400 0000000000001000 c000200033e2b408
         GPR20: 0000000000000000 c000200015080000 c0000003b100c400 0000000000000000
         GPR24: 000000003ee00000 c0000003cce4e800 00000000000003f1 0000000000001000
         GPR28: 000000003ee00000 c000200033e2b3e8 0000000000000080 0000000000000000
 kernel: NIP [c008000012fe8f30] amdgpu_bo_gpu_offset_no_check+0x28/0x78 [amdgpu]
 kernel: LR [c008000013030f54] amdgpu_vm_sdma_set_ptes+0x5c/0x1b0 [amdgpu]
 kernel: Call Trace:
 kernel: [c000200033e2b0b0] [c000200033e2b110] 0xc000200033e2b110 (unreliable)
 kernel: [c000200033e2b0e0] [c008000013030f54] amdgpu_vm_sdma_set_ptes+0x5c/0x1b0 [amdgpu]
 kernel: [c000200033e2b150] [c00800001303195c] amdgpu_vm_sdma_update+0x3b4/0x440 [amdgpu]
 kernel: [c000200033e2b220] [c00800001300fd40] amdgpu_vm_ptes_update+0x368/0xc60 [amdgpu]
 kernel: [c000200033e2b390] [c00800001300935c] amdgpu_vm_update_range+0x304/0x880 [amdgpu]
 kernel: [c000200033e2b4b0] [c008000013009f04] amdgpu_vm_bo_update+0x2ec/0x630 [amdgpu]
 kernel: [c000200033e2b5d0] [c008000012ff5a28] amdgpu_cs_ioctl+0x1610/0x22c0 [amdgpu]
 kernel: [c000200033e2b880] [c008000012f07040] drm_ioctl_kernel+0x118/0x230 [drm]
 kernel: [c000200033e2b8e0] [c008000012f073b0] drm_ioctl+0x258/0x560 [drm]
 kernel: [c000200033e2ba50] [c008000012fc00b8] amdgpu_drm_ioctl+0x70/0xd0 [amdgpu]
 kernel: [c000200033e2baa0] [c0000000005493f4] sys_ioctl+0x744/0x1460
 kernel: [c000200033e2bba0] [c00000000002afd8] system_call_exception+0x138/0x260
 kernel: [c000200033e2be10] [c00000000000c0f0] system_call_vectored_common+0xf0/0x280
 kernel: --- interrupt: 3000 at 0x7fff8eb4433c
 kernel: NIP:  00007fff8eb4433c LR: 00007fff8eb4433c CTR: 0000000000000000
 kernel: REGS: c000200033e2be80 TRAP: 3000   Tainted: G        W           (6.1.0-21-powerpc64le-4k Debian 6.1.90-1.1)
 kernel: MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 48884842  XER: 00000000
 kernel: IRQMASK: 0
         GPR00: 0000000000000036 00007fff62bfe1f0 00007fff8ec56f00 0000000000000053
         GPR04: 00000000c0186444 00007fff62bfe2f0 0000000000180000 00007fff62bfe478
         GPR08: 0000000000100000 0000000000000000 0000000000000000 0000000000000000
         GPR12: 0000000000000000 00007fff62c068c0 00007fff7dc86600 0000000000000005
         GPR16: 00007fff623f0000 0000000000000001 0000000000000031 0000000000000001
         GPR20: fffffffffffffffd 0000000000000000 00007fff59870000 00007fff59860000
         GPR24: 00007fff8b4e1000 00007fff62bfe428 00007fff62bfe448 0000000000000000
         GPR28: 0000000000000053 00000000c0186444 00007fff62bfe2f0 00007fff62bfe2d0
 kernel: NIP [00007fff8eb4433c] 0x7fff8eb4433c
 kernel: LR [00007fff8eb4433c] 0x7fff8eb4433c
 kernel: --- interrupt: 3000
 kernel: Instruction dump:
 kernel: 007936f8 00000000 3c4c0079 384236f8 7c0802a6 60000000 7c0802a6 fbe1fff8
 kernel: f8010010 f821ffd1 e92301c8 e86301a8 <ebe90000> 80890010 3863aaf8 7bff83e4
 kernel: ---[ end trace 0000000000000000 ]---
 kernel:
 kernel: note: firefox-es:cs0[6567] exited with irqs disabled

8
Operating Systems and Porting / Re: Debian 12 status?
« on: July 01, 2024, 10:42:28 am »
Did anybody ever ask other Debian kernel people about integrating my patch in the official kernel packages?

I've updated the 4k kernel patch to work with Debian 12

It is on this branch in a repository that is forked from the official Debian kernel packaging repository.  I've kept the change set as minimal as possible so it is very easy to diff it against the official kernel and see that it is just the page size config option changing.

To make it compile, I had to include one extra patch, the patch from Debian bug #996170.

Any feedback is welcome whether you are running the 64k or the 4k kernel

This is where I posted the full set of commands for compiling a Debian kernel with the patch



9

I already started another thread here about all the suspend modes, including suspend to RAM and the special modes of the POWER9 architecture.  I'm starting this new thread to focus on hibernate, in other words, suspend-to-disk (STD)

There is a section about hibernation in the official Linux kernel guide.  They state:

Quote
Hibernation

This state (also referred to as Suspend-to-Disk or STD) offers the greatest energy savings and can be used even in the absence of low-level platform support for system suspend.

This means it doesn't matter if POWER9 or the motherboard (Talos II, Blackbird, Condor) have any hardware support for hibernation.  The suspend and resume work is all done by the operating system.

The kernel guide goes on to state:

Quote
However, it requires some low-level code for resuming the system to be present for the underlying CPU architecture.

Is that code already available in any newer kernels?

Is that code available in any alternative operating system like FreeBSD or OpenBSD?

I did a check on one of my own systems and it doesn't appear to support it, disk is missing from the output:


$ cat /sys/power/state
freeze mem


Here is the same command on an Intel host, notice the disk support is listed there:


$ cat /sys/power/state
freeze mem disk



10
"dispute with Debian" is not correct.

If you look at the vote about Dr Richard Stallman, the majority of Debian Developers do not want these disputes at all and they voted not to comment on the dispute.

There are a hardcore group of people who run these disputes.  In many cases, they hide what they are doing from the rest of us.  Many of the victims are afraid to speak up.  For example, if somebody receives some email about the CoC, they usually just quit and go to work on something else.  Quitting like that is not an admission of wrongdoing, the victims just don't want to lose their time with these games.

11
If you can only access a particular application by running it from a web site in Chrome / Chromium then that is still very bad news.  It implies there is no equivalent application that you can run natively.

By way of example, people have been trying to promote free, open source webcam and chat solutions for many years but when the pandemic came along, a lot of users were willing to download the Zoom client or run the WebAssembly client in their browser and the free software community was completely left out in the cold.

12
Two things come to mind:

Debian involvement can be a burden for any developer, taking up a little bit more time and energy that could be used for other parts of the platform.

Tim's presence as a maintainer may be a hint that there is a shortage of other volunteers willing to do POWER related tasks in the Debian world.  In fact, all the big distributions have had problems that are discouraging volunteers, it is not only an issue for POWER.

13
Unfortunately I can not shut it down right now to test them in it

14

I put them both into a HP Z6 G4 workstation and they worked immediately in there so I don't think they are faulty, it looks more like a compatibility issue

Is there any other data I can collect from the Talos II to help troubleshoot?

I can try to purchase an alternative PCIe card for these SSDs, I don't mind swapping the cards and testing other models

15

For my first workstation, I purchased two Samsung PM9A1 (1TB) SSDs and I purchased two of the IcyBox IB-PCI208-HS cards to put them into PCIe slots.  They worked immediately and I've had them for 11 months without any problem

Now I purchased two of the PM9A1 (2TB) SSDs and two more identical IB-PCI208-HS cards.  These are only showing up intermittently or not at all when the machine boots.

In petitboot there is a lot of kernel error logging like this:

Code: [Select]
[  937.395593] EEH: Recovering PHB#1-PE#fd
[  937.395607] EEH: PE location: UOPWR.A100029-Node0-CPU1 Slot1 (8x), PHB location: N/A
[  937.395609] EEH: This PCI device has failed 1 times in the last hour and will  be permanently disabled after 5 failures.
[  937.395611] EEH: Notify device drivers to shutdown
[  937.395614] EEH: Beginning: 'error_detected(IO frozen)'
[  937.395620] PCI 0001:01:00.0#00fd: EEH: Invoking nvme->error_detected(IO frozen)
[  937.395627] nvme nvme0: frozen state error detected, reset controller
[  937.578659] PCI 0001:01:00.0#00fd: EEH: nvme driver reports: 'need reset'
[  937.578662] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[  937.578667] EEH: Collect temporary log
[  937.578698] EEH: of node=0001:01:00.0
[  937.578702] EEH: PCI device/vendor: ffffffff
[  937.578706] EEH: PCI cmd/status register: ffffffff
[  937.578707] EEH: PCI-E capabilities and status follow:
[  937.578722] EEH: PCI-E 00: ffffffff ffffffff ffffffff ffffffff

I tried some of the following with no luck:

- removing the cards and re-inserting them in different slots

- removing the SSDs and putting them back into the cards

- upgrading the FPGA on the second workstation from 0xa to 0xc (v1.08) so that it matches the first workstation

- upgrading the PNOR on the second workstation to the v2.01 beta so that it matches the first workstation

- putting a new and bigger PSU on the second workstation (I needed to do this anyway for bigger GPU and more RAM)

- removing the GPU and everything else from the system and trying one SSD at a time


Pages: [1] 2 3 ... 20