Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR
From: Andreas John
Date: Wed Aug 22 2007 - 18:01:50 EST
Hm,
I should add that on 2.6.22-amd64 (ubuntu gutsy) the log entry is as
follows:
----8<------
ata2.00 excetion Emask 0x40 SAct 0x0 SErr 0x800 action 0x2 frozen
ata2.00 tag 0 cmd 0xea Emask 0x44 stat 0x40 err 0x0 timeout
1st FIS failed
----8<------
rgds,
Andreas
Andreas John schrieb:
> Hi SB600-folks,
>
> we bought some AMD690/sb600 based mobos and try go get them working. I
> followed the patches on LKML and switched from Debian Etch 2.6.18-x
> kernel to 2.6.22, just to ensure that all patches are already applied.
> But we still have strange errors/lockups and we found a way to reproduce
> them: simply run checkarry --all and do some dd if=/dev/sda ....
> parallely. We notive load avg going up and then boom ... lockup,
> softraid broken:
>
> ---<8----
> ata2.00: exception Emask 0x0 SAct 0X2 SErr 0x= action 0x0
> ata2.00: (irq_stat 0x40000008)
> ata2.00: cmd 60/00:00:00:69:71/01:00:06:00:00/40 tag 0 cdb 0x0 data
> 131072 in
> ---<8----
>
> This appears with ahci. If I switch to atiixp I only see the cdrom and
> one harddisk, the second does not appear at all and -depending on the
> setting in BIOS setup ahci->sata, native ide, legacy ide- only the cdrom
> appears.
>
> I might note that I first ran into that trouble on amd64 with 4GB RAM.
> Then I swicthed back to 2 GB and back to i386 / 2 GB. The error message
> above is from the i386 / 2 GB variant, but all suffer from this strange
> sata pain, I am not 100% sure, if the log entriea read the same of onyl
> similar. I also tried pci=nomsi some times, but I was still able to
> trigger the bug. I might also note, that I noticed the problem on amd64
> arch and it was simply to trigger it there, but with the checkarry --all
> trick I was also able to trigger it on i386.
>
> Is there anything I can further test? I you provide a patch, I will
> glady test it.
>
> best regards,
> Andreas
>
>
> Conke Hu schrieb:
>> On 3/15/07, Tejun Heo <htejun@xxxxxxxxx> wrote:
>>> Conke Hu wrote:
>>>>> E Internal error: The host bus adapter experienced an internal error
>>>>> that caused the operation to fail and may have put the host bus
>>> adapter
>>>>> into an error state. Host software should reset the interface before
>>>>> re-trying the operation. If the condition persists, the host bus
>>> adapter
>>>>> may suffer from a design issue rendering it incompatible with the
>>>>> attached device.
>>>>>
>>>> Yes, I saw this too :) and I am contacting the hardware engineers to
>>>> check if there is any hardware bug.
>>>> But, even though this were a hardware bug and could be fixed, we would
>>>> still need this patch since many SB600 boards have already come into
>>>> the market and those ASICs can never be fixed :(
>>> Yeap, we certainly need the workaround. I was just having a little fun.
>>> :-)
>>>
>>>>> 4381 isn't affected while 4380 is?
>>>> I never see such an ID, and plan to remove 0x4381.
>>>> The patch which added the PCI IDs was not sent out by myself. I
>>>> checked all SB600 boards, and not found any 0x4381 controller, only
>>>> 0x4380 instead. In fact, SB600 RAID and Non-RAID share the same PCI
>>>> device ID, only with class code different.
>>> I see.
>>>
>>>>> Anyways, Conke Hu, can you please take a look at my patch from a month
>>>>> ago? It's almost identical but SERR_INTERNAL is always ignored on
>>> both
>>>>> SB600 PCI IDs, which I think is safer. Does this fix what you're
>>> seeing?
>>>> I just read your patch. Another difference is that my patch ignores
>>>> SERR_INTERNAL only when the command is ATAPI and IRQ_TF_ERR occurs. In
>>>> other cases, I think, we'd better not ignore the SERR_INTERNEL. Right?
>>> Yeah, I noticed the difference. I don't really care but I was thinking
>>> that SERR_INTERNAL might be set in other similar situations too. e.g.
>>> TF error from ATA device or what not, so I thought it would be safer to
>>> ignore the bit altogether. You probably need to consult your hardware
>>> people about when exactly the bit misbehaves but unless proven
>>> otherwise, I'd prefer to always ignore the bit. Also, please rename the
>>> enum constant and flag name.
>>>
>> Thank you, Tejun!
>> I was discussing with our HW designers on this topic. It is a HW
>> design issue and will be fixed in SB700, the next generation of
>> AMD/ATI southbridge.
>>
>> The correct walkaround/solution for SB600 SATA is:
>> 1. ignore SERR_INTERNAL for both ATA and ATAPI device (as you suggested
>> :p ).
>> 2. ignore SERR_INTERNAL only on IRQ_TF_ERR.
>>
>> I'll re-create the patch.
>>
>> Conke
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/