Re: Fixing the main programmer thinko with the device model

From: James Bottomley
Date: Mon Mar 24 2008 - 14:08:55 EST


On Mon, 2008-03-24 at 10:58 -0700, Greg KH wrote:
> On Mon, Mar 24, 2008 at 10:39:48AM -0500, James Bottomley wrote:
> > Having just spent the weekend tracking two separate driver model
> > problems through SCSI, I believe the biggest trap everyone falls into
> > with the driver model (well, OK, at least with SCSI) is to try to defer
> > a callback to the device ->release routine without realising that
> > somewhere along the callback path we're going to drop a reference to the
> > device.
> >
> > You can do this very inadvertently: One developer didn't realise
> > bsg_unregister_queue() released a ref, and another didn't realise that
> > transport_destroy_device() held one.
> >
> > The real problem is that it's fantastically easy to do this ... it's not
> > at all clear which of the cleanup routines actually release references
> > unless you dig down into them and it's very difficult to detect because
> > all that happens is that devices don't get released when they should,
> > which isn't something we ever warn about.
>
> Sounds like a documentation issue for how the scsi layer is using the
> driver model more than anything else. None of the other busses seem to
> have these kinds of issues that I can see, is it just because of your
> complex usage model?

Possibly ... although the bsg routines aren't actually SCSI ones,
they're block. The transport class destroy function also actually
picked up an implicit reference because of the class device rework you
just did.

> > So, what I was wondering is: is there any way we can reliably detect
> > and warn when someone does this.
>
> Warn that a device did not get released when the programmer thought it
> should yet they forgot to call the correct function to have that happen?
> That seems a bit difficult :)

Yes, that's why I think it has to be discovered via static analysis.

> Also note that the scsi layer usage of multiple refcounted objects
> within the same structure might be causing some of these issues, that's
> a bug in how the scsi layer has implmented things much more so than how
> the driver core is implemented, right?

That's true, but irrelevant (and also soon to be untrue if we get rid of
the scsi_device class as you and Kay keep requesting). The two calls
release references on the actual embedded generic device, it's nothing
to do with entangled lifetime rules.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/