RE: questions regarding possible violation of AHCI spec in AHCIdriver
From: Jian Peng
Date: Wed Dec 08 2010 - 13:14:50 EST
Hi, Tejun,
The problem happened as follow:
After power up, inside ahci_init_one(), it will call ahci_power_up() to toggle PxCMD.SUD bit first, then HBA will send COMRESET to device, and device will send first D2H FIS back. Here it will call ahci_start_engine() to turn on PxCMD.ST to process command. In this case, it may run into race condition that transaction triggered by toggling PxCMD.SUD is not completed yet, and that is the reason why extra check is required by spec to guarantee that HBA already received FIS and in sane state.
In most HBA, either staggered spin-up feature was not supported, or time required for transaction is less than that between two function calls, it may work. IMHO, this is a clear violation of spec, and not robust against all HBA design.
The major concern is that ahci_start_engine() is used widely in EH and it does not return result to reflect whether ST bit was set or not, this may cause trouble in some cases. I am working on verifying those cases with different HBAs now.
Thanks,
Jian
-----Original Message-----
From: Tejun Heo [mailto:tj@xxxxxxxxxx]
Sent: Wednesday, December 08, 2010 2:07 AM
To: Robert Hancock
Cc: Jian Peng; linux-kernel@xxxxxxxxxxxxxxx; jgarzik@xxxxxxxxx; ide
Subject: Re: questions regarding possible violation of AHCI spec in AHCI driver
Hello,
On 12/08/2010 02:54 AM, Robert Hancock wrote:
> On 12/07/2010 01:43 AM, Jian Peng wrote:
>> Recently, while bringing up a new AHCI host controller, I found out
>> that current AHCI driver (in 2.6.37-rc3) may violate AHCI spec in
>> function libahci.c: ahci_start_engine().
>>
>> From end of section 10.1.2 in AHCI 1.3 spec, it claims
>>
>> Software shall not set PxCMD.ST to '1' until it is determined that
>> a functional device is present on the port as determined by
>> PxTFD.STS.BSY = '0', PxTFD.STS.DRQ = '0', and PxSSTS.DET = 3h.
>>
>> It seems working well on most controller without this extra
>> checking, but does cause problem in our new core. Since toggling
>> PxCMD.SUD already initiated reset process at early time, and by the
>> time of ahci_start_engine() got called, BSY bit may not be cleared
>> yet, and forcing PxCMD.ST bit to 1 will cause problem for HW in
>> this case.
Hmmm... interesting. Yeah, we have never had any problem in that area
and would like to avoid changing unless necessary but then again if
it's broken, well, we should. What kind of problem is the controller
showing?
Thanks.
--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/