Re: ata error EH in SWNCQ mode, with nVidia MCP55 sata controllerand SAMSUNG HD103UJ

From: Robert Hancock
Date: Tue Jan 05 2010 - 19:50:49 EST


On 01/05/2010 11:56 AM, Marco Bisetto wrote:
Hi,

A problem with a "IDE interface: nVidia Corporation MCP55 SATA Controller
(rev a3)" and two SAMSUNG HD103UJ sata hard disk drives. Disabling write
cache on a disk gives error:

kernel: [ 45.584445] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
kernel: [ 45.584445] ata1: SWNCQ:qc_active 0x1 defer_bits 0x0
last_issue_tag 0x0
kernel: [ 45.584445] dhfis 0x1 dmafis 0x1 sdbfis 0x0
kernel: [ 45.584445] ata1: ATA_REG 0x40 ERR_REG 0x0
kernel: [ 45.584445] ata1: tag : dhfis dmafis sdbfis sacitve
kernel: [ 45.584445] ata1: tag 0x0: 1 1 0 1
kernel: [ 45.584445] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0
action 0x6 frozen
kernel: [ 45.584445] ata1.00: cmd
61/08:00:3f:f4:e8/00:00:02:00:00/40 tag 0 ncq 4096 out
kernel: [ 45.584445] res
40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
kernel: [ 45.584445] ata1.00: status: { DRDY }
kernel: [ 45.584445] ata1: hard resetting link

The error appears four times for each disk at startup, only when a disk has
write cache disabled. For example, disabling write cache in two disks:

45.595788] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
45.595800] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
76.491877] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
76.511331] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
107.391075] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
107.423701] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
138.287465] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
138.332093] ata1: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1

Disabling write cache on disk attached to ata4 and enabling it on disk
attached to ata1:

45.583489] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
76.479940] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
107.375643] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1
138.272023] ata4: EH in SWNCQ mode,QC:qc_active 0x1 sactive 0x1

Enabling write cache on both disks = no errors.

I don't think the problem can be associated with bad cables or power
supply, as it happens in each channel, it is the same for each disk and
happens at the same time.

Anybody has ideas on what can it be and if there is a solution?

From what I can see, that debug output from sata_nv means that the drive hasn't reported it's completed the command (no SDB FIS) after the timeout (usually 30 seconds). That's an awfully long time. It could be that those drives have issues with NCQ and disabled write cache where some of the commands in the queue can be starved for overly long periods..

CCing linux-ide.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/