Re: ahci_start_engine compliance with AHCI spec

From: Tejun Heo
Date: Fri Jul 22 2011 - 05:03:27 EST


Hello, Brian.

On Thu, Jul 21, 2011 at 10:13:16AM -0700, Brian Norris wrote:
> On Thu, Jul 21, 2011 at 1:49 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> > On Mon, Jul 18, 2011 at 11:40:17AM -0700, Brian Norris wrote:
> >> On Wed, Jul 13, 2011 at 6:14 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> >> > Hmmm... what happens if you don't comment out ahci_start_engine() call
> >> > from ahci_start_port()?
> >>
> >> I wasn't commenting out the ahci_start_engine() from
> >> ahci_start_port(). Can you clarify what you mean?
> >
> > Oh, I meant "what if you comment out..."  I wrote that sentence in
> > negative and then switched but forgot removing "don't".
>
> OK, well I tried simply commenting out that ahci_start_engine() on
> both my special controller and on the Dell E6410 laptop and it worked
> just fine (solved my issues and didn't cause any issues on the Dell).
> Is this safe? It seems like we end up calling ahci_start_engine() at
> the end of the error handling process anyway, so maybe this call is
> not really necessary in the first place?

Yes, I believe so.

> Anyway, I also tried my own fix for this: adding a small delay to wait
> for some link recognition at the end of ahci_power_up(). I'm not sure
> if this is the greatest, but it also works for both systems I'm
> testing. I included the test patch here (based on linux-2.6). BTW, I'm
> not sure my mail will be formatted perfectly here. I can resend with
> my other mailer if needed.

The problem is that both my and your approach aren't ultimately safe
on this particular IP block. I don't think it's possible make things
completely safe for it. There's no mutual exclusion against PHY
events - be it flaky signal, power surge or actual hotplug - and
driver operation. No matter how careful the driver behaves, if PHY
events happen after the last check before starting DMA engine, DRQ may
be set by the time driver gets to it.

The IP block you're dealing with is inherently buggy. What the spec
means, I think, is the DMA engine might not start or behave properly
if enabled while DRQ is set, which is fine. Driver will notice that,
reset stuff and retry. It is *completely* different from "the
controller becomes brick until power cycled if that happens". So, we
can work around all we want but that is one buggy controller. If
possible, please tell the manufacturer or licensor to fix it.

For now, let's first try removing ahci_start_engine() call from
port_start and see how that goes.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/