Re: [PATCH 11/12] libata: use IRQ expecting

From: Tejun Heo
Date: Sat Jun 26 2010 - 05:46:00 EST

Hello, Jeff.

On 06/26/2010 11:16 AM, Jeff Garzik wrote:
> On 06/26/2010 04:31 AM, Tejun Heo wrote:
>> Well, it can indicte the start of cluster of completions, which is the
>> necessary information anyway. From the second call on, it's a simple
>> flag test and return. I doubt it will affect anything even w/ high
>> performance SSDs but please read on.
> Yes, and your patch calls unexpect_irq() at the _start_ of a cluster of
> completions. That is nonsensical, because it reflects the /opposite/ of
> the present ATA bus state, when multiple commands are in flight.

That's actually what we wanna know. I'll talk about it below.

>> ata_qc_complete_multiple() call [un]expect_irq() only once by
>> introducing an internal completion function w/o irq expect handling,
>> say ata_qc_complete_raw() and making both ata_qc_complete() and
>> ata_qc_complete_multiple() simple wrapper around it w/ irq expect
>> handling.
> Yes, this fixes problem, but it is better to create a wrapper path for
> the legacy PATA/SATA1 that uses irq-expecting, and a fast path for
> modern controllers that do not use it.
>> On 06/26/2010 05:45 AM, Jeff Garzik wrote:
>>> We don't want to burden modern SATA drivers with the overhead of
>>> dealing with silly PATA/SATA1 legacy irq nastiness, particularly the
>>> ugliness of calling
>> I think we're much better off applying it to all the drivers. IRQ
>> expecting is very cheap and scalable and there definitely are plenty
>> of IRQ delivery problems with modern controllers although their
>> patterns tend to be different from legacy ones. Plus, it will also be
>> useful for power state predictions.
> Modern SATA/SAS controllers, and their drivers, already have well
> defined methods of acknowledging interrupts, even unexpected ones, in
> ways that do not need this core manipulation. This is over-engineering,
> punishing all modern chipsets moving forward regardless of their design,
> by unconditionally requiring this behavior of all libata drivers.

Unacked irqs are primarily handled by spurious IRQ handling. IRQ
expecting is more about lost interrupts and we have enough lost
interrupt cases even on new controllers w/ native interface, both
transient and non-transient.

One of the goals of this whole IRQ exception handling was to make it
dumb easy for drivers to use which also included makes things cheap
enough so that they can be called from hot paths. Both expect and
unexpect_irq() are very cheap once the IRQ delivery is verified. If
the processor is taking an interrupt in the first place, this amount
of logic shouldn't matter at all. There really isn't punishment to
avoid and IMHO not doing it for native controllers is an over
optimization. It gains almost nothing while losing meaningful

> Just like the rest of libata's layered driver architecture, it should be
> straightforward to apply this only to SFF/BMDMA chipsets, then tackle
> odd cases as needs arise.
> Modern controllers acknowledge interrupts sanely, and always "expect" an
> interrupt when you include interrupt events like hotplug, even if the
> ATA bus itself is idle. There is no need to burden the millions of ahci
> users with irq-expecting, for example.

I'm not saying applying it to only SFF/BMDMA is difficult, just that
it's better to apply it to all drivers in this case. IRQ expecting is
to protect against misdelivered / lost IRQs and we do have them for
ahci, sil24 or whatever too. It would of course be silly to pay
significant performance overhead for such protection but as I stated
above, it's _really_ cheap. If the driver is taking an interrupt and
accessing harddware and even if compared only against the general
complexity of generic IRQ and libata code, the cost of IRQ [un]expect
is negligible and designed precisely that way to allow use cases like

> With regards to power state predictions, it is only useful if you are
> accurately reflecting the ATA bus state (idle or not) at all times. As
> mentioned above, this patch clearly creates a condition where
> unexpect_irq() is called when commands remain in flight, and libata is
> expecting further command completions.
> IOW, patch #11 says "we are not expecting irq" when we are.
> At least a halfway sane approach would be to track bus-idle status, and
> trigger useful code when that status changes (idle->active or
> active->idle). Perhaps LED, power state, and irq-expecting could all
> use such a triggering mechanism.

Continuing from the response to the first paragraph. The IRQ
expecting code isn't interested in the bus state, it's interested only
in the IRQ events and that's what it's expecting. The same applies to
power state prediction too, so please consider the following NCQ
command execution sequence.

1. issue tags 0, 1, 2, 3
2. IRQ triggers, tags 0, 2 complete
3. IRQ triggers, tags 1, 3 completes

For IRQ expecting, both 1-2 and 2-3 are segments to expect for and for
power state transition too, as it's IRQ itself which forces the cpu to
come out of sleep state. The reason why I said unexpect in
ata_qc_complete() is okay is that it can still delimit each segment as
long as we have proper irq_expect() call at the beginning of each
segment (all other unexpect calls are ignored). But, that's kind of
moot point as we can easily do single pair.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at