Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - pocock

Pages: 1 [2] 3 4 ... 20

Operating Systems and Porting / Debian security standards, deflecting questions about SSH2 issue

« on: July 03, 2024, 11:38:27 am »

The Talos II and Blackbird have been marketed as a platform for security-minded users and many people have purchased the platform with that in mind.

Security is only as good as the weakest link in the chain. It is no good having the most secure hardware if there are regular defects in the OS or web browser or some other level in the stack.

I've recently started blogging about Debian's handling of security issues.

This is not a new concern: in 2008, it was the OpenSSL random number generator and some people still have vulnerable keys in use today, 16 years later.

The new revelation is that in March 2000, Edward Brocklesby took over the SSH2 package and uploaded new binaries into Debian

Six weeks later and in April 2000 Brocklesby was secretly expelled for hacking

The Debian Social Contract, point 3 tells us "we won't hide problems". I felt the social contract compelled me to bring this SSH2 affair into the public domain at the beginning of June 2024. Andreas Tille has made four more "Statement on Daniel Pocock" insult responses in barely four weeks, two of them on web sites and two by spam emails. Somebody commented that Debian never had such a big hissy fit.

Nonetheless, these hissy fits reveal a lot about the culture. I made a chronological review of the culture so people can see it is not about me, the series of suicides and other deaths, with evidence, suggest it is about the mindset of the group. For people who have to answer everything with a new "Statement on Daniel Pocock", what we see is that being stubborn is more important than being secure.

The Brocklesby affair may be 24 years ago but it actually reveals a continuity. We can measure subsequent security incidents against the Brocklesby affair and see that each time Debian is tested, the responses are lackluster.

Operating Systems and Porting / Re: Debian 12 status?

« on: July 03, 2024, 11:16:52 am »

I took the 6.7.12-1 backport for bookworm and applied the 4k page size patch.

It has the same problem with amdgpu that I saw using the 6.1 kernel with 4k patch. Errors about "Not enough memory for command submission!" and eventually Firefox and the whole desktop freeze.

I pushed each of my branches for anybody else who wants to try it. Maybe it is fine with other GPUs

This is the 4k patch against the 6.7.12-1 kernel on the branch for unstable and testing

The same patch backported for the 6.7.12-1 kernel on bookworm-backports

The 6.1 standard kernel branch for bookworm with the 4k page size patch

There are various other web sites discussing the error message. Some of them suggest that the amdgpu firmware needs to be updated.

One of them points to a new version of the amdgpu firmware that appeared in the upstream kernel tree in June 2024

At present, the Debian firmware package still has older amdgpu firmware from 2023 - you can see the current versions here, even unstable hasn't been updated since June 2023.

Here is the packaging repository on Debian Git (Salsa), we can't clearly see the actual versions of the amdgpu firmware files

Operating Systems and Porting / Re: Debian 12 status?

« on: July 02, 2024, 05:02:25 pm »

Using the Debian 12 (bookworm) kernel 6.1 package with 4k page size I started getting errors in the log and eventually Firefox would freeze and the whole GUI (GNOME desktop) would freeze. The system is still accessible over SSH.

After the first crash, I started radeontop and noticed up to 80% GPU VRAM utilisation with Firefox running.

I rebooted into the kernel from bullseye (5.10.46-4.1 built from my 4k page size branch), running with the Debian 12 filesystem and it is running fine, no crashes. I have had that kernel running for months on end on this platform using the bullseye filesystem.

I'm going to build the 6.7.12 kernel backport for bookworm with the 4k page size and try that as well.

journalctl captures a lot of errors like this, sometimes they appear for hours before it eventually crashes. I was able to repeat the crash a couple of times.

I see it as a good thing that the platform is still responding over SSH even when the GUI has crashed.

Code: [Select]

kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Not enough memory for command submission!
...

The crash is captured too:

Code: [Select]

 kernel: ------------[ cut here ]------------
 kernel: WARNING: CPU: 5 PID: 6577 at drivers/gpu/drm/ttm/ttm_bo.c:357 ttm_bo_release+0x538/0x5b0 [ttm]
 kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge snd_seq_dummy snd_hrtimer snd_seq rfkill qrtr 8021q garp stp mrp llc overlay sunrpc binfmt_misc ext4 crc16 mbcache jbd2 uvcvideo videobuf2_vmalloc videobuf2_memops snd_usb_audio snd_hda_codec_hdmi videobuf2_v4l2 videobuf2_common snd_usbmidi_lib snd_hda_intel snd_rawmidi snd_intel_dspcfg videodev snd_seq_device evdev joydev snd_hda_codec mc snd_hda_core snd_hwdep snd_pcm sg snd_timer snd ofpart soundcore ipmi_powernv powernv_flash ctr ipmi_devintf at24 vmx_crypto mtd regmap_i2c ipmi_msghandler gf128mul opal_prd parport_pc lp parport fuse configfs loop ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress xts ecb uas usb_storage dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic
 kernel:  raid1 raid0 multipath linear md_mod hid_generic usbhid hid sd_mod amdgpu gpu_sched drm_buddy i2c_algo_bit drm_display_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt xhci_pci fb_sys_fops xhci_hcd nvme nvme_core t10_pi tg3 drm mpt3sas crc64_rocksoft_generic usbcore crc64_rocksoft crc_t10dif crct10dif_generic crc64 crct10dif_common drm_panel_orientation_quirks raid_class libphy usb_common scsi_transport_sas
 kernel: CPU: 5 PID: 6577 Comm: Renderer Not tainted 6.1.0-21-powerpc64le-4k #1  Debian 6.1.90-1.1
 kernel: Hardware name: T2P9D01 REV 1.00 POWER9 0x4e1203 opal:skiboot-9858186 PowerNV
 kernel: NIP:  c00800000f4d2120 LR: c008000012fe7270 CTR: c00800000f4d2198
 kernel: REGS: c00020001bc571d0 TRAP: 0700   Not tainted  (6.1.0-21-powerpc64le-4k Debian 6.1.90-1.1)
 kernel: MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 84824244  XER: 20040036
 kernel: CFAR: c00800000f4d1c34 IRQMASK: 0 
         GPR00: c008000012fe7270 c00020001bc57470 c00800000f508800 c0000003b100c5c0 
         GPR04: 0000000000000000 0000000ffcac0000 0000000000041a8b 0000000000000000 
         GPR08: 0000000000041a8a c00020001af87738 0000000000000000 c008000013564c90 
         GPR12: c00800000f4d2198 c000000ffffdbf00 c00020001b054b80 c0002000ef821798 
         GPR16: 00000000002c6800 0000000000000001 0000000000000000 0000000000000000 
         GPR20: 00000000002c6880 c000200015080000 0000000000000071 00000000002c6800 
         GPR24: 0000000000000003 c00020001af87010 0000000000000000 000000003ee00000 
         GPR28: c0000003b100c458 c000200015080000 c000200015085508 c0000003b100c5c0 
 kernel: NIP [c00800000f4d2120] ttm_bo_release+0x538/0x5b0 [ttm]
 kernel: LR [c008000012fe7270] amdgpu_bo_unref+0x38/0x60 [amdgpu]
 kernel: Call Trace:
 kernel: [c00020001bc57470] [c00800000f4d1f18] ttm_bo_release+0x330/0x5b0 [ttm] (unreliable)
 kernel: [c00020001bc57500] [c008000012fe7270] amdgpu_bo_unref+0x38/0x60 [amdgpu]
 kernel: [c00020001bc57530] [c0080000130105fc] amdgpu_vm_ptes_update+0xc24/0xc60 [amdgpu]
 kernel: [c00020001bc576a0] [c00800001300935c] amdgpu_vm_update_range+0x304/0x880 [amdgpu]
 kernel: [c00020001bc577c0] [c008000013009f04] amdgpu_vm_bo_update+0x2ec/0x630 [amdgpu]
 kernel: [c00020001bc578e0] [c008000012ff0bcc] amdgpu_gem_va_ioctl+0x674/0x6b0 [amdgpu]
 kernel: [c00020001bc57a20] [c008000012f07040] drm_ioctl_kernel+0x118/0x230 [drm]
 kernel: [c00020001bc57a80] [c008000012f073b0] drm_ioctl+0x258/0x560 [drm]
 kernel: [c00020001bc57bf0] [c008000012fc00b8] amdgpu_drm_ioctl+0x70/0xd0 [amdgpu]
 kernel: [c00020001bc57c40] [c0000000005493f4] sys_ioctl+0x744/0x1460
 kernel: [c00020001bc57d40] [c00000000002afd8] system_call_exception+0x138/0x260
 kernel: [c00020001bc57e10] [c00000000000c0f0] system_call_vectored_common+0xf0/0x280
 kernel: --- interrupt: 3000 at 0x7fff8eb4433c
 kernel: NIP:  00007fff8eb4433c LR: 00007fff8eb4433c CTR: 0000000000000000
 kernel: REGS: c00020001bc57e80 TRAP: 3000   Not tainted  (6.1.0-21-powerpc64le-4k Debian 6.1.90-1.1)
 kernel: MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 44224840  XER: 00000000
 kernel: IRQMASK: 0 
         GPR00: 0000000000000036 00007fff75b9bb20 00007fff8ec56f00 0000000000000053 
         GPR04: 00000000c0286448 00007fff75b9bc00 0000000000280000 00000002c5a00000 
         GPR08: 0000000000100000 0000000000000000 0000000000000000 0000000000000000 
         GPR12: 0000000000000000 00007fff75ba68c0 00007fff75b9c028 00007fff75b9c178 
         GPR16: 00007fff75b9c508 0000000000000001 0000000000020000 00007fff75b9cd18 
         GPR20: 0000000000000001 0000000000020000 00007fff75b9be80 0000000000ea0000 
         GPR24: 0000000000000000 0000000000200000 0000000000000004 0000000000200000 
         GPR28: 0000000000000053 00000000c0286448 00007fff75b9bc00 00007ffefa5ca0a0 
 kernel: NIP [00007fff8eb4433c] 0x7fff8eb4433c
 kernel: LR [00007fff8eb4433c] 0x7fff8eb4433c
 kernel: --- interrupt: 3000
 kernel: Instruction dump:
 kernel: 4bfffe30 60000000 60000000 60420000 0fe00000 7c0802a6 fb610068 fba10078 
 kernel: f80100a0 60000000 60000000 60420000 <0fe00000> 4bffffe0 60000000 60420000 
 kernel: ---[ end trace 0000000000000000 ]---
 kernel: [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-12)
 kernel: Kernel attempted to read user page (0) - exploit attempt? (uid: 1000)
 kernel: BUG: Kernel NULL pointer dereference on read at 0x00000000
 kernel: Faulting instruction address: 0xc008000012fe8f30
 kernel: Oops: Kernel access of bad area, sig: 11 [#1]
 kernel: LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
 kernel: Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge snd_seq_dummy snd_hrtimer snd_seq rfkill qrtr 8021q garp stp mrp llc overlay sunrpc binfmt_misc ext4 crc16 mbcache jbd2 uvcvideo videobuf2_vmalloc videobuf2_memops snd_usb_audio snd_hda_codec_hdmi videobuf2_v4l2 videobuf2_common snd_usbmidi_lib snd_hda_intel snd_rawmidi snd_intel_dspcfg videodev snd_seq_device evdev joydev snd_hda_codec mc snd_hda_core snd_hwdep snd_pcm sg snd_timer snd ofpart soundcore ipmi_powernv powernv_flash ctr ipmi_devintf at24 vmx_crypto mtd regmap_i2c ipmi_msghandler gf128mul opal_prd parport_pc lp parport fuse configfs loop ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress xts ecb uas usb_storage dm_crypt dm_mod raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic
 kernel:  raid1 raid0 multipath linear md_mod hid_generic usbhid hid sd_mod amdgpu gpu_sched drm_buddy i2c_algo_bit drm_display_helper drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt xhci_pci fb_sys_fops xhci_hcd nvme nvme_core t10_pi tg3 drm mpt3sas crc64_rocksoft_generic usbcore crc64_rocksoft crc_t10dif crct10dif_generic crc64 crct10dif_common drm_panel_orientation_quirks raid_class libphy usb_common scsi_transport_sas
 kernel: CPU: 4 PID: 6567 Comm: firefox-es:cs0 Tainted: G        W          6.1.0-21-powerpc64le-4k #1  Debian 6.1.90-1.1
 kernel: Hardware name: T2P9D01 REV 1.00 POWER9 0x4e1203 opal:skiboot-9858186 PowerNV
 kernel: NIP:  c008000012fe8f30 LR: c008000013030f54 CTR: c00000000098c350
 kernel: REGS: c000200033e2ae10 TRAP: 0300   Tainted: G        W           (6.1.0-21-powerpc64le-4k Debian 6.1.90-1.1)
 kernel: MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24224244  XER: 200400dd
 kernel: CFAR: c008000013030f50 DAR: 0000000000000000 DSISR: 40000000 IRQMASK: 0 
         GPR00: c008000013030f54 c000200033e2b0b0 c00800001377c600 c000200015085508 
         GPR04: c0000003b100c400 0000000000000000 000000003ee00000 0000000000000080 
         GPR08: 0000000000001000 0000000000000000 c0000003cce4e800 c008000013565488 
         GPR12: c00000000098c350 c000000ffffdcc00 00000000002c6880 00000000002c6880 
         GPR16: 0000000000080000 c0000003b100c400 0000000000001000 c000200033e2b408 
         GPR20: 0000000000000000 c000200015080000 c0000003b100c400 0000000000000000 
         GPR24: 000000003ee00000 c0000003cce4e800 00000000000003f1 0000000000001000 
         GPR28: 000000003ee00000 c000200033e2b3e8 0000000000000080 0000000000000000 
 kernel: NIP [c008000012fe8f30] amdgpu_bo_gpu_offset_no_check+0x28/0x78 [amdgpu]
 kernel: LR [c008000013030f54] amdgpu_vm_sdma_set_ptes+0x5c/0x1b0 [amdgpu]
 kernel: Call Trace:
 kernel: [c000200033e2b0b0] [c000200033e2b110] 0xc000200033e2b110 (unreliable)
 kernel: [c000200033e2b0e0] [c008000013030f54] amdgpu_vm_sdma_set_ptes+0x5c/0x1b0 [amdgpu]
 kernel: [c000200033e2b150] [c00800001303195c] amdgpu_vm_sdma_update+0x3b4/0x440 [amdgpu]
 kernel: [c000200033e2b220] [c00800001300fd40] amdgpu_vm_ptes_update+0x368/0xc60 [amdgpu]
 kernel: [c000200033e2b390] [c00800001300935c] amdgpu_vm_update_range+0x304/0x880 [amdgpu]
 kernel: [c000200033e2b4b0] [c008000013009f04] amdgpu_vm_bo_update+0x2ec/0x630 [amdgpu]
 kernel: [c000200033e2b5d0] [c008000012ff5a28] amdgpu_cs_ioctl+0x1610/0x22c0 [amdgpu]
 kernel: [c000200033e2b880] [c008000012f07040] drm_ioctl_kernel+0x118/0x230 [drm]
 kernel: [c000200033e2b8e0] [c008000012f073b0] drm_ioctl+0x258/0x560 [drm]
 kernel: [c000200033e2ba50] [c008000012fc00b8] amdgpu_drm_ioctl+0x70/0xd0 [amdgpu]
 kernel: [c000200033e2baa0] [c0000000005493f4] sys_ioctl+0x744/0x1460
 kernel: [c000200033e2bba0] [c00000000002afd8] system_call_exception+0x138/0x260
 kernel: [c000200033e2be10] [c00000000000c0f0] system_call_vectored_common+0xf0/0x280
 kernel: --- interrupt: 3000 at 0x7fff8eb4433c
 kernel: NIP:  00007fff8eb4433c LR: 00007fff8eb4433c CTR: 0000000000000000
 kernel: REGS: c000200033e2be80 TRAP: 3000   Tainted: G        W           (6.1.0-21-powerpc64le-4k Debian 6.1.90-1.1)
 kernel: MSR:  900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 48884842  XER: 00000000
 kernel: IRQMASK: 0 
         GPR00: 0000000000000036 00007fff62bfe1f0 00007fff8ec56f00 0000000000000053 
         GPR04: 00000000c0186444 00007fff62bfe2f0 0000000000180000 00007fff62bfe478 
         GPR08: 0000000000100000 0000000000000000 0000000000000000 0000000000000000 
         GPR12: 0000000000000000 00007fff62c068c0 00007fff7dc86600 0000000000000005 
         GPR16: 00007fff623f0000 0000000000000001 0000000000000031 0000000000000001 
         GPR20: fffffffffffffffd 0000000000000000 00007fff59870000 00007fff59860000 
         GPR24: 00007fff8b4e1000 00007fff62bfe428 00007fff62bfe448 0000000000000000 
         GPR28: 0000000000000053 00000000c0186444 00007fff62bfe2f0 00007fff62bfe2d0 
 kernel: NIP [00007fff8eb4433c] 0x7fff8eb4433c
 kernel: LR [00007fff8eb4433c] 0x7fff8eb4433c
 kernel: --- interrupt: 3000
 kernel: Instruction dump:
 kernel: 007936f8 00000000 3c4c0079 384236f8 7c0802a6 60000000 7c0802a6 fbe1fff8 
 kernel: f8010010 f821ffd1 e92301c8 e86301a8 <ebe90000> 80890010 3863aaf8 7bff83e4 
 kernel: ---[ end trace 0000000000000000 ]---
 kernel: 
 kernel: note: firefox-es:cs0[6567] exited with irqs disabled

Operating Systems and Porting / Re: Debian 12 status?

« on: July 01, 2024, 10:42:28 am »

Did anybody ever ask other Debian kernel people about integrating my patch in the official kernel packages?

I've updated the 4k kernel patch to work with Debian 12

It is on this branch in a repository that is forked from the official Debian kernel packaging repository. I've kept the change set as minimal as possible so it is very easy to diff it against the official kernel and see that it is just the page size config option changing.

To make it compile, I had to include one extra patch, the patch from Debian bug #996170.

Any feedback is welcome whether you are running the 64k or the 4k kernel

This is where I posted the full set of commands for compiling a Debian kernel with the patch

Operating Systems and Porting / hibernate / resume / suspend-to-disk (STD) doesn't need hardware support?

« on: June 05, 2023, 06:01:33 am »

I already started another thread here about all the suspend modes, including suspend to RAM and the special modes of the POWER9 architecture. I'm starting this new thread to focus on hibernate, in other words, suspend-to-disk (STD)

There is a section about hibernation in the official Linux kernel guide. They state:

Quote

Hibernation

This state (also referred to as Suspend-to-Disk or STD) offers the greatest energy savings and can be used even in the absence of low-level platform support for system suspend.

This means it doesn't matter if POWER9 or the motherboard (Talos II, Blackbird, Condor) have any hardware support for hibernation. The suspend and resume work is all done by the operating system.

The kernel guide goes on to state:

Quote

However, it requires some low-level code for resuming the system to be present for the underlying CPU architecture.

Is that code already available in any newer kernels?

Is that code available in any alternative operating system like FreeBSD or OpenBSD?

I did a check on one of my own systems and it doesn't appear to support it, disk is missing from the output:


$ cat /sys/power/state
freeze mem

Here is the same command on an Intel host, notice the disk support is listed there:


$ cat /sys/power/state
freeze mem disk

Operating Systems and Porting / Re: Tim Pearson added as Debian maintainer

« on: March 31, 2023, 10:04:26 am »

"dispute with Debian" is not correct.

If you look at the vote about Dr Richard Stallman, the majority of Debian Developers do not want these disputes at all and they voted not to comment on the dispute.

There are a hardcore group of people who run these disputes. In many cases, they hide what they are doing from the rest of us. Many of the victims are afraid to speak up. For example, if somebody receives some email about the CoC, they usually just quit and go to work on something else. Quitting like that is not an admission of wrongdoing, the victims just don't want to lose their time with these games.

Operating Systems and Porting / Re: Tim Pearson added as Debian maintainer

« on: March 28, 2023, 03:46:09 am »

If you can only access a particular application by running it from a web site in Chrome / Chromium then that is still very bad news. It implies there is no equivalent application that you can run natively.

By way of example, people have been trying to promote free, open source webcam and chat solutions for many years but when the pandemic came along, a lot of users were willing to download the Zoom client or run the WebAssembly client in their browser and the free software community was completely left out in the cold.

Operating Systems and Porting / Re: Tim Pearson added as Debian maintainer

« on: March 27, 2023, 04:04:15 pm »

Two things come to mind:

Debian involvement can be a burden for any developer, taking up a little bit more time and energy that could be used for other parts of the platform.

Tim's presence as a maintainer may be a hint that there is a shortage of other volunteers willing to do POWER related tasks in the Debian world. In fact, all the big distributions have had problems that are discouraging volunteers, it is not only an issue for POWER.

General Discussion / Re: Samsung PM9A1 IcyBox IB-PCI208-HS not working in one Talos II, OK in the other

« on: September 14, 2022, 04:35:05 pm »

Unfortunately I can not shut it down right now to test them in it

General Discussion / Re: Samsung PM9A1 IcyBox IB-PCI208-HS not working in one Talos II, OK in the other

« on: September 14, 2022, 02:07:22 pm »

I put them both into a HP Z6 G4 workstation and they worked immediately in there so I don't think they are faulty, it looks more like a compatibility issue

Is there any other data I can collect from the Talos II to help troubleshoot?

I can try to purchase an alternative PCIe card for these SSDs, I don't mind swapping the cards and testing other models

General Discussion / Samsung PM9A1 IcyBox IB-PCI208-HS not working in one Talos II, OK in the other

« on: September 14, 2022, 01:29:19 pm »

For my first workstation, I purchased two Samsung PM9A1 (1TB) SSDs and I purchased two of the IcyBox IB-PCI208-HS cards to put them into PCIe slots. They worked immediately and I've had them for 11 months without any problem

Now I purchased two of the PM9A1 (2TB) SSDs and two more identical IB-PCI208-HS cards. These are only showing up intermittently or not at all when the machine boots.

In petitboot there is a lot of kernel error logging like this:

Code: [Select]

[  937.395593] EEH: Recovering PHB#1-PE#fd
[  937.395607] EEH: PE location: UOPWR.A100029-Node0-CPU1 Slot1 (8x), PHB location: N/A
[  937.395609] EEH: This PCI device has failed 1 times in the last hour and will  be permanently disabled after 5 failures.
[  937.395611] EEH: Notify device drivers to shutdown
[  937.395614] EEH: Beginning: 'error_detected(IO frozen)'
[  937.395620] PCI 0001:01:00.0#00fd: EEH: Invoking nvme->error_detected(IO frozen)
[  937.395627] nvme nvme0: frozen state error detected, reset controller
[  937.578659] PCI 0001:01:00.0#00fd: EEH: nvme driver reports: 'need reset'
[  937.578662] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[  937.578667] EEH: Collect temporary log
[  937.578698] EEH: of node=0001:01:00.0
[  937.578702] EEH: PCI device/vendor: ffffffff
[  937.578706] EEH: PCI cmd/status register: ffffffff
[  937.578707] EEH: PCI-E capabilities and status follow:
[  937.578722] EEH: PCI-E 00: ffffffff ffffffff ffffffff ffffffff

I tried some of the following with no luck:

- removing the cards and re-inserting them in different slots

- removing the SSDs and putting them back into the cards

- upgrading the FPGA on the second workstation from 0xa to 0xc (v1.08) so that it matches the first workstation

- upgrading the PNOR on the second workstation to the v2.01 beta so that it matches the first workstation

- putting a new and bigger PSU on the second workstation (I needed to do this anyway for bigger GPU and more RAM)

- removing the GPU and everything else from the system and trying one SSD at a time

General OpenPOWER Discussion / external oscillator in v1.01 boards - which problems does it aim to resolve?

« on: September 14, 2022, 11:34:52 am »

The wiki mentions that newer boards (v1.01) have an external oscillator for the FPGA

The oscillator is mentioned in this commit

If somebody has an older FPGA, e.g. 1.06 and they have a v1.01 motherboard, is there a pressing reason to upgrade the FPGA to 1.08?

Are there any particular problems that might occur if the user does not upgrade?

General OpenPOWER Discussion / Re: USB keyboards not working in petitboot bootloader environment

« on: September 14, 2022, 07:33:49 am »

It does appear to be related to the jumper

With the jumper installed:
the view of petitboot on the VGA output and in the SSH session are identical.
When I type something in the SSH session it appears on both the VGA and in the SSH console

Without the jumper:
the VGA and USB keyboard act as a terminal together.
The SSH / BMC terminal operates independently, what I type there doesn't appear on the VGA and vice-versa, what I type on the USB keyboard doesn't appear in the SSH view of petitboot

It would be useful to have a note about this in the motherboard manual, to the effect that the jumper impacts the keyboard and not only the VGA

General OpenPOWER Discussion / Re: USB keyboards not working in petitboot bootloader environment

« on: September 14, 2022, 02:18:28 am »

I have the jumper installed for disabling internal VGA, does that also serve to mask the keyboard in some way?

General OpenPOWER Discussion / Re: USB keyboards not working in petitboot bootloader environment

« on: September 14, 2022, 02:07:35 am »

Connecting to the BMC with SSH and accessing the petitboot console with obmc-console-client

In petitboot, I choose the shell option

At the shell, I run dmesg

The dmesg output shows that usbhid is loaded and it shows the keyboard

Code: [Select]

[    6.004628] input:   USB Keyboard Consumer Control as /devices/pci0003:00/0003:00:00.0/0003:01:00.0/usb1/1-4/1-4.2/1-4.2:1.1/0003:04D9:1702.0005/input/input11
[    6.004683] hid-generic 0003:04D9:1702.0005: input: USB HID v1.10 Device [  USB Keyboard] on usb-0003:01:00.0-4.2/input1

The numlock light works on the keyboard too, so it appears to have power to the keyboard

Pages: 1 [2] 3 4 ... 20

Raptor Computing Systems Community Forums (BETA)

News:

Show Posts

Messages - pocock

Operating Systems and Porting / Debian security standards, deflecting questions about SSH2 issue

Operating Systems and Porting / Re: Debian 12 status?

Operating Systems and Porting / Re: Debian 12 status?

Operating Systems and Porting / Re: Debian 12 status?

Operating Systems and Porting / hibernate / resume / suspend-to-disk (STD) doesn't need hardware support?

Operating Systems and Porting / Re: Tim Pearson added as Debian maintainer

Operating Systems and Porting / Re: Tim Pearson added as Debian maintainer

Operating Systems and Porting / Re: Tim Pearson added as Debian maintainer

General Discussion / Re: Samsung PM9A1 IcyBox IB-PCI208-HS not working in one Talos II, OK in the other

General Discussion / Re: Samsung PM9A1 IcyBox IB-PCI208-HS not working in one Talos II, OK in the other

General Discussion / Samsung PM9A1 IcyBox IB-PCI208-HS not working in one Talos II, OK in the other

General OpenPOWER Discussion / external oscillator in v1.01 boards - which problems does it aim to resolve?

General OpenPOWER Discussion / Re: USB keyboards not working in petitboot bootloader environment

General OpenPOWER Discussion / Re: USB keyboards not working in petitboot bootloader environment

General OpenPOWER Discussion / Re: USB keyboards not working in petitboot bootloader environment