Re: exception Emask 0x0 SAct 0x1 / SErr 0x0 action 0x2 frozen

From: Tom Mortensen
Date: Tue Sep 30 2008 - 16:47:45 EST


Don't know if this is the original poster's problem, but if the drive
is spun down, then enabling SMART or trying to read SMART attributes
causes the drive to spin up and the command is delayed until this has
occurred.

The fix is to increase the timeout given to scsi_execute() in
drivers/ata/libata-scsi.c.

ie, current code (2.6.26.5) is:

/* Good values for timeout and retries? Values below
from scsi_ioctl_send_command() for default case... */
cmd_result = scsi_execute(scsidev, scsi_cmd, data_dir, argbuf, argsize,
sensebuf, (10*HZ), 5, 0);

Should be changed to:

/* Good values for timeout and retries? Values below
from scsi_ioctl_send_command() for default case... */
cmd_result = scsi_execute(scsidev, scsi_cmd, data_dir, argbuf, argsize,
sensebuf, (30*HZ), 5, 0);

Using a 1TB Hitachi hard drive, this command times out because it
takes this drive about 15 seconds to spin up. Virtutally all hard
drives spin up in less than 30 sec, but perhaps make this higher in
case there are slower drives out there?

Cheers,
Tom

On Mon, Sep 29, 2008 at 2:13 AM, Tejun Heo <tj@xxxxxxxxxx> wrote:
> Bill Davidsen wrote:
>> Gwendal Grignou wrote:
>>> About ata1:0 problem, as reported in the bugzilla bug: I would try to
>>> disable NCQ to see if it helps. Your disks firmware might not fully
>>> support it.
>>>
>>> You can either add the parameter "libata.force=noncq" when loading
>>> your kernel, or set queue_depth to 1 for all the Seagate drives behind
>>> the Marvell MV88SX6081 controller.
>>>
>>> About ata5:0 , someone - in user space probably - is trying to do a
>>> SMART ENABLE operation, but the device ignores it. I don't know which
>>> device you are using, but I assume it does not support ATA SMART
>>> feature set. Timeout is an acceptable but not a nice way to answer, a
>>> cancel would have been better; check if there is a firmware upgrade
>>> for your device.
>>>
>>
>> You certainly called the SMART issue, I was wondering why a new
>> distribution install on some older hardware was getting all the errors,
>> clearly the Fedora "smartd" doesn't check SMART capability before trying
>> to enable the feature. Oddly the drive on which I see this does reply to
>> SMART requests, so the firmware must be "semi-functional." Not a
>> problem, in my case the drive is just used for testing handling of hot
>> swap, and has no data of any value.
>
> Can you post full kernel log including the boot messages and the error
> messages? Also, please attach the output of hdparm -I on the drive
> which fails the smart command.
>
> (cc'ing Bruce, hi!) Bruce, this is the second report I see about drive
> timing out SMART ENABLE OPERATIONS. Does anything ring a bell?
>
> Thanks.
>
> --
> tejun
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/