Re: libata error handling
From: Tejun Heo
Date: Fri Aug 19 2005 - 00:41:03 EST
Hi, Jeff.
Jeff Garzik wrote:
Tejun,
In an email I cannot find anymore, you asked why I was interested in
converting libata to use the fine-grained EH hooks in the SCSI layer,
rather than continued with the current ->eh_strategy_handler() method.
Several reasons:
1) The fine-grained hooks of the SCSI layer are somewhat standard for
block devices. The events they signify -- timeout, abort cmd, dev
reset, bus reset, and host reset -- map precisely to the events that we
must deal with at the ATA level.
I genearally agree that the events are somewhat standard for block
devices but IMHO SCSI EH also has fair amount SCSI-specific assumptions
and ATA is a bit too different from SCSI to fit cleanly into it. For
example, when handling NCQ errors, the whole task set is aborted and the
status is retrieved with read log page. This can be worked around in
one of the hooks and emulate SCSI behavior, but it just doesn't really
fit well. And I think that recovering via translation layer is a bit
too much translation.
So, my thought is that SCSI EH assumptions are a bit too specific to
be used as standard for block devices.
But be warned of false sharing, as I talk about in #2...
2) When libata SAT translation layer becomes optional, and libata drives
a "true" block device, use of ->eh_strategy_handler() will actually be
an obstacle due to false sharing of code paths. ->eh_strategy_handler()
is indeed a single "do it all" EH entrypoint, but within that entrypoint
you must perform several SCSI-specific tasks.
It's true that we must do SCSI specific tasks inside libata if we use
eh_strategy_handler but I don't think switching to fine-grained EH will
reduce the amount of SCSI-specific things inside libata. I think as
long as we can insulate LLDD's from SCSI layer, either way should be
okay later.
3) ->eh_strategy_handler() has continually proven to be a method of
error handling poorly supported by the SCSI layer. There are many
assumption coded into the SCSI layer that this is -not- the path taken
by LLD EH code, and libata must constantly work around these assumptions.
4) libata is the -only- user of ->eh_strategy_handler(), and oddballs
must be stomped out. It creates a maintenance burden on the SCSI layer
that should be eliminated.
I agree that being the only user does incur difficulties, but my very
subjective feeling is that the original libata EH implementation was
just a bit too fragile to start with. eg. not grabbing host lock on EH
entrance causing command completion vs. EH handling race and handling
errors in several different ways.
Heh... Maybe I'm just reluctant to let go of my patches. Anyways,
I'll now stand down and see how things go and try to help.
Thanks, always.
--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/