Author Topic: Blackbird: Booting from NVMe SSD sometimes disables SATA HDD and BluRay drive  (Read 11136 times)

FlyingBlackbird

  • Full Member
  • ***
  • Posts: 102
  • Karma: +3/-0
    • View Profile
Did anybody of you manage to use a NVMe SSD together with a SATA HDD (booting Linux from the SSD)?

The combination of
  • Samsung EVO Plus 970 TB NVMe SSD in the PCIe x8 slot
  • Seagate IronWolf Pro 8 TB (ST8000NE0004) SATA III HDD in SATA-2
  • Asus BW-16D1HT Retail BluRay Writer in SATA-1
  • Fedora Server 31 installed on the SSD
caused many drop-outs (= disabled) of the SATA devices (HDD and BluRay) during the boot phase of Linux.

Without an SSD I can always boot successfully Linux from the same HDD

I have also tried another PCIe to M.2 NVMe hardware with the same problems
  • RaidSonic ICY BOX IB-PCI214M2-HSL M.2 to PCIe adapter and
  • Delock M.2 PCI Express x4 card
to exclude an adapter incompatibility issue.

petitboot always recognized SATA devices until booting of Linux causes the SATA devices to be disabled.
After rebooting (without power off) petitboot then also did not show the SATA devices anymore until I power-off and restart.
"Rescan devices" in petitboot does not help...

Fedora Server 31 log with the SATA drop-outs shown in dmesg:

Code: [Select]
[    0.990585] ata3: SATA max UDMA/133 abar m2048@0x600c100000000 port 0x600c100000200 irq 30
[    1.487812] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.507342] ata3.00: ATA-10: ST8000NE0004-1ZF11G, EN01, max UDMA/133
[    1.507345] ata3.00: 15628053168 sectors, multi 16: LBA48 NCQ (depth 32), AA
[    6.557731] ata3.00: qc timeout (cmd 0xec)
[    6.557737] ata3.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[    6.557738] ata3.00: revalidation failed (errno=-5)
[   16.557020] ata3: softreset failed (1st FIS failed)
[   26.557020] ata3: softreset failed (1st FIS failed)
[   61.556654] ata3: softreset failed (1st FIS failed)
[   61.556656] ata3: limiting SATA link speed to 3.0 Gbps
[   66.556654] ata3: softreset failed (1st FIS failed)
[   66.556656] ata3: reset failed, giving up
[   66.556658] ata3.00: disabled

[    0.990587] ata4: SATA max UDMA/133 abar m2048@0x600c100000000 port 0x600c100000280 irq 30
[    1.487797] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    1.491980] ata4.00: ATAPI: ASUS    BW-16D1HT, 3.10, max UDMA/133
[    1.499775] ata4.00: configured for UDMA/133
[   97.577022] ata4: softreset failed (1st FIS failed)
[  107.577021] ata4: softreset failed (1st FIS failed)
[  142.576654] ata4: softreset failed (1st FIS failed)
[  147.576654] ata4: softreset failed (1st FIS failed)
[  147.576656] ata4: reset failed, giving up
[  147.576658] ata4.00: disabled

mx08

  • Newbie
  • *
  • Posts: 8
  • Karma: +2/-0
    • View Profile
I have not yet experienced the problem myself (though I also run NVMe SSDs in my system), but I can tell you that zdykstra and q66 in the IRC channel (#talos-workstation) have experienced similar issues.

FlyingBlackbird

  • Full Member
  • ***
  • Posts: 102
  • Karma: +3/-0
    • View Profile
I have not yet experienced the problem myself (though I also run NVMe SSDs in my system)

Thanks for your feed-back :-)

Do you use the NVMe SSD together with any other SATA device (which is my constellation supposed to cause problems - in my case I suspect the optical drive)?

kth5

  • Newbie
  • *
  • Posts: 11
  • Karma: +2/-0
    • View Profile
Reviving this thread since I recently attempted to add a DVD-ROM drive I had lying around. So pardon me...

As I said, I attempted to add a SATA LG DVD-ROM (yes, ROM) and I could boot most times but it wouldn't mount any CD or DVD. When it did not boot, it just started to fail with those softreset FIS messages whenever I tried doing anything with it from the initramfs. Sometimes going so far as to disable other drives connected to one of the 4 SATA ports on my Blackbird. Curiously, the eject program did seem to do what it should.

So I thought to myself, this drive's probably trash after years in storage. So I just got a fresh LG DVD-RW drive and it does the same.

Worse, it almost always halts the boot process completely after kexec launched the OS's kernel and it trying to detect all drives. Once at the port with the LG DVD-RW connected, it halts and will not expose any of the drives.

When the system DOES boot - which happens randomly, I see this in dmesg until it eventually downgrades to UDMA/33 and locks up SATA entirely.

Code: [Select]
[Wed Dec 16 21:44:55 2020] ahci 0002:01:00.0: AHCI 0001.0000 32 slots 4 ports 6 Gbps 0xf impl SATA mode
[Wed Dec 16 21:44:55 2020] ahci 0002:01:00.0: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs
[Wed Dec 16 21:44:55 2020] scsi host0: ahci
[Wed Dec 16 21:44:55 2020] scsi host1: ahci
[Wed Dec 16 21:44:55 2020] scsi host2: ahci
[Wed Dec 16 21:44:55 2020] scsi host3: ahci
[Wed Dec 16 21:44:55 2020] ata1: SATA max UDMA/133 abar m2048@0x600c100010000 port 0x600c100010100 irq 30
[Wed Dec 16 21:44:55 2020] ata2: SATA max UDMA/133 abar m2048@0x600c100010000 port 0x600c100010180 irq 30
[Wed Dec 16 21:44:55 2020] ata3: SATA max UDMA/133 abar m2048@0x600c100010000 port 0x600c100010200 irq 30
[Wed Dec 16 21:44:55 2020] ata4: SATA max UDMA/133 abar m2048@0x600c100010000 port 0x600c100010280 irq 30
[Wed Dec 16 21:44:55 2020] nvme nvme0: 15/0/0 default/read/poll queues
[Wed Dec 16 21:44:55 2020]  nvme0n1: p1 p2
[Wed Dec 16 21:44:56 2020] random: fast init done
[Wed Dec 16 21:44:56 2020] ata1: SATA link down (SStatus 0 SControl 300)
[Wed Dec 16 21:44:56 2020] usb 1-1: new high-speed USB device number 2 using xhci_hcd
[Wed Dec 16 21:44:56 2020] usb 1-1: New USB device found, idVendor=1a40, idProduct=0101, bcdDevice= 1.11
[Wed Dec 16 21:44:56 2020] usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[Wed Dec 16 21:44:56 2020] usb 1-1: Product: USB 2.0 Hub
[Wed Dec 16 21:44:56 2020] hub 1-1:1.0: USB hub found
[Wed Dec 16 21:44:56 2020] hub 1-1:1.0: 4 ports detected
[Wed Dec 16 21:44:56 2020] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Wed Dec 16 21:44:56 2020] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Wed Dec 16 21:44:56 2020] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[Wed Dec 16 21:44:56 2020] ata3.00: supports DRM functions and may not be fully accessible
[Wed Dec 16 21:44:56 2020] ata2.00: ATAPI: HL-DT-ST DVDRAM GH24NSD5, LV00, max UDMA/133
[Wed Dec 16 21:44:56 2020] ata3.00: ATA-11: Samsung SSD 860 EVO 500GB, RVT04B6Q, max UDMA/133
[Wed Dec 16 21:44:56 2020] ata3.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 32), AA
[Wed Dec 16 21:44:56 2020] ata2.00: configured for UDMA/133
[Wed Dec 16 21:44:56 2020] ata3.00: supports DRM functions and may not be fully accessible
[Wed Dec 16 21:44:56 2020] scsi 1:0:0:0: CD-ROM            HL-DT-ST DVDRAM GH24NSD5  LV00 PQ: 0 ANSI: 5
[Wed Dec 16 21:44:56 2020] ata3.00: configured for UDMA/133
[Wed Dec 16 21:44:56 2020] scsi 2:0:0:0: Direct-Access     ATA      Samsung SSD 860  4B6Q PQ: 0 ANSI: 5
[Wed Dec 16 21:44:56 2020] ata3.00: Enabling discard_zeroes_data
[Wed Dec 16 21:44:56 2020] sd 2:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/466 GiB)
[Wed Dec 16 21:44:56 2020] sd 2:0:0:0: [sda] Write Protect is off
[Wed Dec 16 21:44:56 2020] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[Wed Dec 16 21:44:56 2020] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[Wed Dec 16 21:44:56 2020] ata4.00: ATA-10: ST4000DM004-2CV104, 0001, max UDMA/133
[Wed Dec 16 21:44:56 2020] ata4.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 32), AA
[Wed Dec 16 21:44:56 2020] sr 1:0:0:0: [sr0] scsi3-mmc drive: 10x/48x writer dvd-ram cd/rw xa/form2 cdda tray
[Wed Dec 16 21:44:56 2020] cdrom: Uniform CD-ROM driver Revision: 3.20
[Wed Dec 16 21:44:56 2020]  sda: sda1
[Wed Dec 16 21:44:56 2020] ata3.00: Enabling discard_zeroes_data
[Wed Dec 16 21:44:56 2020] sd 2:0:0:0: [sda] supports TCG Opal

<snip>

[Wed Dec 16 21:45:12 2020] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[Wed Dec 16 21:45:12 2020] ata2.00: configured for UDMA/133

<snip>

[Wed Dec 16 21:48:55 2020] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[Wed Dec 16 21:48:55 2020] ata2.00: cmd a0/00:00:00:02:00/00:00:00:00:00/a0 tag 12 pio 16388 in
                                    Mode Sense(10) 5a 00 2a 00 00 00 00 00 02 00res 40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
[Wed Dec 16 21:48:55 2020] ata2.00: status: { DRDY }
[Wed Dec 16 21:48:55 2020] ata2: hard resetting link
[Wed Dec 16 21:48:56 2020] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[Wed Dec 16 21:48:56 2020] ata2.00: configured for UDMA/133
[Wed Dec 16 21:48:56 2020] ata2: EH complete
[Wed Dec 16 21:49:36 2020] ata2.00: limiting speed to UDMA/100:PIO4
[Wed Dec 16 21:49:36 2020] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[Wed Dec 16 21:49:36 2020] ata2.00: cmd a0/00:00:00:02:00/00:00:00:00:00/a0 tag 3 pio 16388 in
                                    Mode Sense(10) 5a 00 2a 00 00 00 00 00 02 00res 40/00:02:00:00:02/00:00:00:00:00/00 Emask 0x4 (timeout)
[Wed Dec 16 21:49:36 2020] ata2.00: status: { DRDY }
[Wed Dec 16 21:49:36 2020] ata2: hard resetting link
[Wed Dec 16 21:49:37 2020] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[Wed Dec 16 21:49:37 2020] ata2.00: configured for UDMA/100
[Wed Dec 16 21:49:37 2020] ata2: EH complete


I also happen to have a nvme drive in the 4x slot but I don't think this has anything to do with it.

What I tried so far:
* the drive works in an x86 box (so did the old LG DVD-ROM I tried first)
* I removed any other drives from SATA and left just the drive, no dice
* swapped SATA cables thrice. no dice
* swapped the PSU connector for another branch, no dice
* upgraded to Linux 5.10.1, no dice

I have another 5.10.1 building with some legacy Marvell and OF ATA/PATA/SATA stuff but honestly, I think it's something to do with the controller I'm missing.

Only sr_mod based devices seem affected, all other drives (good ol'spinning, SSHD & SSD) do work.

EDIT:
Both drives work via USB2SATA on the same box with the same OS constellation.
« Last Edit: December 16, 2020, 04:52:48 pm by kth5 »

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 471
  • Karma: +37/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
I think it might be the controller as well. Do you have a PCI one you can try?

witsu

  • Newbie
  • *
  • Posts: 14
  • Karma: +2/-0
    • View Profile
I experience this issue from time to time too. I am using a Marvell PCI-E SATA controller.
Rebooting a few times seems to resolve the issue most of the time.
Removing the optical drive  also stops this from happening and allows the hard drive to work fine.
For now I've just left it disconnected since I don't use it frequently.

kth5

  • Newbie
  • *
  • Posts: 11
  • Karma: +2/-0
    • View Profile
I think it might be the controller as well. Do you have a PCI one you can try?

I do have one somewhere at the office. I don't want it to be a permanent thing since I quite like my accellerated dual-screen graphics with an nvme drive as a build partition. :D
I'll see if I can get it over tomorrow to try.

The big question though, I have had my blackbird since I think August 2019 and never did try connecting any kind of optical drive other than via USB. Could it be that it's just my board, its production run or a design flaw in the controller itself?
I'm fine without a warranty kind of process even if it were a possibility, just too bad I can't waste the 8x slot for another SATA controller and not miss out of the nvme.

FlyingBlackbird

  • Full Member
  • ***
  • Posts: 102
  • Karma: +3/-0
    • View Profile
Quote
The big question though, I have had my blackbird since I think August 2019 and never did try connecting any kind of optical drive other than via USB. Could it be that it's just my board, its production run or a design flaw in the controller itself?

It was me who opened this thread and my Blackbird was RMA'ed and is still waiting to be reassembled again (shame on me but I simply had no time so far).

I had quite similar problems but my system did finally not even boot without any drive attached.

I will try to reassemble my system in the next three weeks with the same configuration as in my first post in this thread and see if I can boot with an optical drive attached via SATA (Asus).

Hopefully this is not a design flaw. I will report my results here...

BTW: Did you already try to enable the "cold restart" setting (I don't remember the exact name but it re-initializes the complete hardware when restarting like a real power-on boot phase). There is some documentation in somewhere here and in the wiki

mx08

  • Newbie
  • *
  • Posts: 8
  • Karma: +2/-0
    • View Profile
Sorry FlyingBlackbird for the late reply.

Do you use the NVMe SSD together with any other SATA device (which is my constellation supposed to cause problems - in my case I suspect the optical drive)?

I'm not sure. I have an SATA DVD drive installed now, but haven't used my BB for some time now and don't remember if I used it with the DVD drive, or installed that after stopping using it.

Quote
BTW: Did you already try to enable the "cold restart" setting (I don't remember the exact name but it re-initializes the complete hardware when restarting like a real power-on boot phase). There is some documentation in somewhere here and in the wiki

It's called "fast reboot". Personally, I have that enabled because otherwise my GPU (WX3200) sometimes does not reset properly and then disappear from the system (even not in lspci).

How to disable fast reboot: https://shenki.github.io/skiboot-disable-fast-reboot/