Re: [PATCH 2/4] libata: Implement disk shock protection support

From: Elias Oltmanns
Date: Wed Sep 10 2008 - 15:30:35 EST


Tejun Heo <htejun@xxxxxxxxx> wrote:
> Hello, Elias.
>
> Elias Oltmanns wrote:
[...]
>> +static unsigned long ata_eh_park_devs(struct ata_port *ap)
>> +{
>> + struct ata_link *link;
>> + struct ata_device *dev;
>> + struct ata_taskfile tf;
>> + unsigned int err_mask;
>> + unsigned long deadline = jiffies;
>> +
>> + ata_port_for_each_link(link, ap) {
>> + ata_link_for_each_dev(dev, link) {
>> + struct ata_eh_context *ehc = &link->eh_context;
>> + struct ata_eh_info *ehi = &link->eh_info;
>> +
>> + if (dev->class != ATA_DEV_ATA ||
>> + dev->flags & ATA_DFLAG_NO_UNLOAD)
>> + continue;
>> +
>> + if (ehc->i.dev_action[dev->devno] & ATA_EH_PARK ||
>> + ehi->dev_action[dev->devno] & ATA_EH_PARK) {
>> + unsigned long tmp = dev->unpark_deadline;
>
> The correct way to do this is ata_eh_about_to_do(). After that, you
> can just look at ehc->i.dev_action[]. Also, you'll need to call
> ata_eh_done() later.

We have a problem here, I'm afraid, because we may keep looping in EH
context and still want to pick up ATA_EH_PARK requests. Imagine that
ATA_EH_PARK has been scheduled for device A and the EH thread has
reached the call to schedule_timeout_uninterruptible(). Now, ATA_EH_PARK
is scheduled for device B on the same port. This will wake up the EH
thread, but ATA_EH_PARK is only recorded in link->eh_info, not in
link->eh_context.i. ata_eh_about_to_do() will unconditionally clear the
flag in eh_info, but checking ehc->i.dev_action afterwards will only
tell us whether this flag was set when we entered EH, not whether it had
been set since.

Should I change ata_eh_about_to_do() so that it will record the action
in link->eh_context before clearing it in link->eh_info?

>
>> + if (time_before(deadline, tmp))
>> + deadline = tmp;
>> + else if (time_before_eq(tmp, jiffies))
>> + continue;
>> + }
>> +
>> + if (ehc->did_unload_mask & (1 << dev->devno))
>> + continue;
>> +
>> + ata_tf_init(dev, &tf);
>> + tf.command = ATA_CMD_IDLEIMMEDIATE;
>> + tf.feature = 0x44;
>> + tf.lbal = 0x4c;
>> + tf.lbam = 0x4e;
>> + tf.lbah = 0x55;
>> + tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
>> + tf.protocol |= ATA_PROT_NODATA;
>> + err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE,
>> + NULL, 0, 0);
>> + if (err_mask || tf.lbal != 0xc4)
>> + ata_dev_printk(dev, KERN_ERR,
>> + "head unload failed!\n");
>> + else
>> + ehc->did_unload_mask |= 1 << dev->devno;
> ...
>> +static void ata_eh_unpark_devs(struct ata_port *ap)
>> +{
>> + struct ata_link *link;
>> + struct ata_device *dev;
>> + struct ata_taskfile tf;
>> +
>> + ata_port_for_each_link(link, ap) {
>> + ata_link_for_each_dev(dev, link) {
>> + struct ata_eh_context *ehc = &link->eh_context;
>> +
>> + if (!(ehc->did_unload_mask & (1 << dev->devno)))
>> + continue;
>> +
>> + ata_tf_init(dev, &tf);
>> + tf.command = ATA_CMD_CHK_POWER;
>> + tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
>> + tf.protocol |= ATA_PROT_NODATA;
>> + ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
>
> And it's probably better to have ehc->unloaded_mask instead of
> ehc->did_unload_mask and clear it here so that if unload is scheduled
> after this point but before EH completes, it does unloading again.
> ie. Something like the following.
>
> ata_eh_done(ATA_EH_UNLOAD);
> ehc->i.unloaded_mask &= ~(1 << dev->devno);

No need for that because link->eh_context is cleared in
ata_scsi_error().

>
>> @@ -2830,6 +2904,19 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
>> }
>> }
>>
>> + do {
>> + unsigned long now;
>> +
>> + deadline = ata_eh_park_devs(ap);
>> + now = jiffies;
>> + if (time_before_eq(deadline, now))
>> + break;
>> + prepare_to_wait(&ata_scsi_park_wq, &wait, TASK_UNINTERRUPTIBLE);
>> + deadline = schedule_timeout_uninterruptible(deadline - now);
>> + } while (deadline);
>> + finish_wait(&ata_scsi_park_wq, &wait);
>> + ata_eh_unpark_devs(ap);
>
> I think it would be better to put timeout computation and handling out
> here instead of inside ata_eh_park_devs(). ata_eh_park_devs() just
> parks the heads if ATA_DEV_UNLOAD and the outer loop controls when it
> can continue.

Right.

>
>> +static ssize_t ata_scsi_park_store(struct device *device,
>> + struct device_attribute *attr,
>> + const char *buf, size_t len)
>> +{
> ...
>
>> + switch (input) {
>> + case -1:
>> + dev->flags &= ~ATA_DFLAG_NO_UNLOAD;
>> + break;
>> + case -2:
>> + dev->flags |= ATA_DFLAG_NO_UNLOAD;
>> + break;
>
> Can't we just drop ATA_DFLAG_NO_UNLOAD? It doesn't provide any real
> functionality anymore.

I was afraid you'd say something like that in the end ;-). Well, we
can't. We really should only issue the unload command if we know that
it's safe, i.e., the device supports that feature. We assume it to be
safe if ata_id_has_unload() returns true or if the user told us that the
device does support the command. ATA_DFLAG_NO_UNLOAD is initialised
during device setup by ata_id_has_unload(). For pre-ATA-7 devices (like
mine), the user can manually clear that flag afterwards.

Regards,

Elias
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/