Re: [Bug #12263] Sata soft reset filling log

From: Sergei Shtylyov
Date: Tue Feb 17 2009 - 20:03:26 EST


Hello.

Justin Madru wrote:

If 12609 is truly a post-2.6.28 regression and 12263 is post-2.6.27
regresssion, this just cannot be.
Maybe the reporter of #12609 didn't notice/test kernels 28-rc1 to 28. Or maybe
the difference in hardware is
the issue, but the bug is still the same. Don't know.

Sorry Justin, you must be confused: as Sergei says,
#12609 and #12263 can only be different.

I was one of the reporters of #12609, and I do know it's a post-2.6.28
regression (and Larry said so too), and one fix (not the preferred fix)
is to revert the ata_bmdma32_port_ops from 2.6.29-rc, and the preferred
fix is to improve the ata_sff_data_xfer32() introduced in 2.6.29-rc1.

2.6.28 does not contain any ata_bmdma32_port_ops, nor ata_sff_data_xfer32(),
not did 2.6.28-rc1 contain them. So it is impossible for the reversion of
the patch that introduced them to fix any problem on 2.6.28.

I'm quite prepared to believe that your #12263 manifests similarly to
#12609, and that a tip tree which contains a fix for #12609 contains
a fix for #12263; but please, those bugs are not the same, and they
don't have the same fix.

Hugh

Well, like I said: "[I] Don't know". I'm not a kernel developer (or even any developer... yet).
I'm just someone that tests the -rc kernels to see if there's any problems with my hardware.
I try to report any regressions to lkml, and hopefully help the developers.

To me, who has no knowledge of all these low level issues, the following error messages
look strikingly similar with a quick glance.

# bug 12609
# http://marc.info/?l=linux-kernel&m=123254501314058&w=4
#
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x5 (timeout)
ata2.00: status: { DRDY ERR }
ata2: soft resetting link
ata2.00: configured for UDMA/33
ata2: EH complete

# bug 12263
# http://marc.info/?l=linux-kernel&m=122913412608533&w=4
#
ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata2.00: ST_FIRST: !(DRQ|ERR|DF)
ata2.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
cdb 1e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
res 50/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)

Note the different value of the status, error and interrupt reason registers: 51/20:03 vs 50/00:01. The former means (unexpected?) status phase interrupt with error indication and the sense key NOT READY, the latter means (unexpected?) command phase interrupt with no error. IIUC, the former happens once the 'sr' driver first sends the TEST UNIT READY command while probing the CD/DVD drive, the latter seems to be a result of some polling process (originated from userland) -- I'm not seeing ALLOW_MEDIUM_REMOVAL anywhere in this driver. So they only look similar, I think...


So, will the patch for 12609 fix my issue also, or does there need to be another patch?

Most probably it'll need another patch.

Justin Madru

MBR, Sergei


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/