Re: [PATCH RESEND^2] sd: fix infinite kernel/udev loop onnon-removable Medium Not Present

From: Steven J. Magnani
Date: Mon Apr 29 2013 - 16:45:43 EST

On Fri, 2013-04-26 at 10:30 -0700, James Bottomley wrote:
On Fri, 2013-04-26 at 11:39 -0500, Steven J. Magnani wrote:
> > Commit eface65c336eff420d70beb0fb6787a732e05ffb (2.6.38) altered
> > set_media_not_present() in a way that prevents the sd driver from
> > remembering that a non-removable device has reported "Medium Not
> > This condition can occur on hotplug of a (i.e.) USB Mass Storage
> > whose medium is offline due to an unrecoverable controller error,
> > but which is otherwise capable of SCSI communication (to download
> > microcode, etc.).
> This actually sounds to be a bug somewhere in the device. Only
> removable devices are supposed to signal medium not present.

I don't see that spelled out in the -2 family of standards (SAM-2 /
SPC-2 / SBC-2). We're only claiming SPC-2 compliance; perhaps that's
something that got clarified in a later version?

> > Under these conditions, the changed code results in an infinite loop
> > between the kernel and udevd. When udevd attempts to open the device
> > in response to a change notification, a SCSI "Medium Not Present"
> > occurs which causes the kernel to signal another change. The cycle
> > repeats until the device is unplugged, resulting in udevd consuming
> > increasing amounts of CPU and virtual memory.

One precondition for the infinite loop that got lost between
investigation of the problem and submission of the patch is that the
device has to identify itself BOTH as "non-removable" and "write
protected". The reason is this clause in sd_open():

if (sdev->removable || sdkp->write_prot)

...which causes READ CAPACITY to be issued, to which the device responds
MEDIUM NOT PRESENT, which causes set_media_not_present() to be invoked
and to signal another "changed" event for userland.

Apologies for the oversight.

> > Resolve this by remembering "media not present" whether the device
> > declared itself "removable" or not.
> >
> > Signed-off-by: Steven J. Magnani <steve@xxxxxxxxxxxxxxx>
> > ---
> > --- a/drivers/scsi/sd.c 2013-04-12 14:16:12.252531097 -0500
> > +++ b/drivers/scsi/sd.c 2013-04-12 14:21:55.197216521 -0500
> > @@ -1298,10 +1298,8 @@ out:
> >
> > static void set_media_not_present(struct scsi_disk *sdkp)
> > {
> > - if (sdkp->media_present)
> > + if (sdkp->media_present) {
> > sdkp->device->changed = 1;
> > -
> > - if (sdkp->device->removable) {
> > sdkp->media_present = 0;
> > sdkp->capacity = 0;
> I don't really like this change because it will affect TUR failure as
> well, which looks like it would then zero the capacity of a
> non-removable device which we aren't expecting.

I think that concern is easily addressed by tweaking the patch to keep
the "sdkp->capacity = 0" statement conditional on

> Can we dig into what's
> going wrong with the device first. It sounds like it really is a
> removable device and it just needs to be flagged that way (either that
> or the USB SAT is so screwed up that we might need to apply other
> blacklists)

Being a USB Mass Storage device, it is removable from a USB sense, but
not I think in a SCSI sense - there is no ejectable cartridge or
anything. Commands like PREVENT ALLOW MEDIUM REMOVAL, which
identification as removable usually triggers, are nonsensical.

The scenario is generally one of these two uncommon use cases:

1. The device has come up in "failsafe mode" because the user damaged
the normal "operational" microcode by yanking power during an
operational microcode update. Failsafe mode allows for microcode update,
and not much else.

B. The internal controller has experienced an unrecoverable error
(reported with appropriate sense codes) during normal operation and the
USB connection has since been cycled (disconnected and reconnected; the
device is wall-powered, not bus-powered).

I don't think either is invalid.

I will stipulate that declaring a block device write-protected when no
medium is present does not make a whole lot of sense, and that's
something we will address going forward. However, I do think it's a Bad
Thing (from a security perspective, at minimum) if simply connecting a
device to a machine can bring down the system. The Windows FAT driver
has such issues. Surely Linux can do better, at least in this particular
case where only a minor tweak is needed, not even the addition of
defensive code.

Steven J. Magnani "I claim this network for MARS! Earthling, return my space modulator!"

#include <standard.disclaimer>

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at