Software > User Zone

ata4: softreset failed. ata4.00 disabled (SATA devices not working sometimes)

<< < (2/3) > >>

MPC7500:
Have you tried to disconnect the BD-drive, whether the problem disappears or persists?

FlyingBlackbird:

--- Quote from: MPC7500 on February 01, 2020, 06:24:31 pm ---Have you tried to disconnect the BD-drive, whether the problem disappears or persists?

--- End quote ---

Not yet, I am trying to diagnose the problem by changing only one small thing (step-by-step). But I am almost quite sure it works if no NVMe SSD is attached so it looks like a hardware incompatibility.

Currently I have applied the libata.force=norst kernel parameter to disable hard and soft resets
and the optical drive is working currently (but the problem is non-deterministic so I have to wait until the error occurs again).

I have already unplugged the attached Seagate SATA HDD so I would exclude this device from the suspects.

Next step would be to change the cable, then disconnect (no more SATA devices)...

BTW The kernel.org doc for kernel parameters explains the options for libata.force quite well:

https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html


--- Quote ---libata.force=   [LIBATA] Force configurations.  The format is comma separated list of "[ID:]VAL" where ID is
                        PORT[.DEVICE].  PORT and DEVICE are decimal numbers matching port, link or device.  Basically, it matches
                        the ATA ID string printed on console by libata.  If the whole ID part is omitted, the last PORT and DEVICE
                        values are used.  If ID hasn't been specified yet, the configuration applies to all ports, links and devices.

                        If only DEVICE is omitted, the parameter applies to the port and all links and devices behind it.  DEVICE
                        number of 0 either selects the first device or the first fan-out link behind PMP device.  It does not
                        select the host link.  DEVICE number of 15 selects the host link and device attached to it.

                        The VAL specifies the configuration to force.  As long as there's no ambiguity shortcut notation is allowed.
                        For example, both 1.5 and 1.5G would work for 1.5Gbps. The following configurations can be forced.

                        * Cable type: 40c, 80c, short40c, unk, ign or sata.  Any ID with matching PORT is used.

                        * SATA link speed limit: 1.5Gbps or 3.0Gbps.

                        * Transfer mode: pio[0-7], mwdma[0-4] and udma[0-7]. udma[/][16,25,33,44,66,100,133] notation is also allowed.

                        * [no]ncq: Turn on or off NCQ.

                        * [no]ncqtrim: Turn off queued DSM TRIM.

                        * nohrst, nosrst, norst: suppress hard, soft and both resets.

                        * rstonce: only attempt one reset during hot-unplug link recovery

                        * dump_id: dump IDENTIFY data.

                        * atapi_dmadir: Enable ATAPI DMADIR bridge support

                        * disable: Disable this device.

                        If there are multiple matching configurations changing the same attribute, the last one is used.

--- End quote ---

FlyingBlackbird:

--- Quote from: FlyingBlackbird on February 02, 2020, 04:48:46 am ---Next step would be to change the cable, then disconnect (no more SATA devices)...

--- End quote ---

OK, short update: <tt>libata.force=norst</tt> does not solve the problem but changes only the error message:


--- Code: ---[6.397240] ata4.00 failed to IDENTIFY (I/O error, err_mask=0x4)
--- End code ---

emask 0x4 should mean "timeout" AFAIR

madscientist159:
Interesting thread and issue!

I'd suspect one of the following:

* Linux driver problem with Marvell controller (would probably be a regression since these used to work quite reliably)
* Hardware problem -- e.g. PSU sagging slightly under extra load from NVMe drive, causing controller or drive to malfunction
Have you tried (carefully) removing power to the SATA drive when it's in timeout status, and reapplying it, to see if the link comes back up?  Or moving the cable to a different port to see if it's the entire chip that's locked up or just the one port?

FlyingBlackbird:

--- Quote from: madscientist159 on February 02, 2020, 05:51:42 pm ---I'd suspect one of the following:

--- End quote ---

> Hardware problem -- e.g. PSU sagging slightly under extra load from NVMe drive, causing controller or drive to malfunction

My power supply unit is quite oversized (650 W) and a famous brand.
With a SATA HDD alone it seems to work. I will try out the optical drive in another X86 Linux computer with a similar kernel (but without having a NVMe SSD available) to see what happens.
Changing the PSU would be my "last resort".

> Have you tried (carefully) removing power to the SATA drive when it's in timeout status, and reapplying it, to see if the link comes back up?

How save is this when the system is powered on and running (risk to damage the main board...)?
And how do I enforce a new PCI bus scan (does reapplying the power to the optical SATA drive cause a new scan?)?

>  Or moving the cable to a different port to see if it's the entire chip that's locked up or just the one port?

Will be my next try, currently I am running with a SATA HDD only and have observed no problems so far.

BTW: If you possible have hints or a link on how to debug a libata/marvel during booting of Linux I would try to do this (I know gdb quite well) but I am sure your time is rare so it is OK to ignore my wish ;-)

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version