Re: [GIT PULL] Driver core fix for 4.5-rc4

From: James Bottomley
Date: Sun Feb 14 2016 - 16:21:19 EST


On Sun, 2016-02-14 at 12:57 -0800, Linus Torvalds wrote:
> On Sun, Feb 14, 2016 at 11:02 AM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx
> > wrote:
> >
> > Here is one driver core, well klist, fix for 4.5-rc4. It fixes a
> > problem found in the scsi device list traversal that probably also
> > could
> > be triggered by other subsystems.
>
> So I pulled this, but quite frankly, the fix smells bad to me.
>
> If the n_ref kref can go down to zero at any time, how is that
> "struct klist_node *n" safe to ever even touch in the caller?
>
> IOW, what is it that protects that klist_node from not having
> entirely been released, and any access to the kref might be a use
> -after-free (and the use of "kref_get_unless_zero()" just hides the
> problem).

OK, so if I explain, don't shoot the messenger; I didn't write the
klist abstraction, I'm just trying to make it work. The design of the
abstraction is that the reference counting is kept internal to the
layer that uses it (in this case drivers/base). The klist_node is
designed to be self renewing, so you can genuinely remove the
klist_node from the list and when the last reference goes to zero, the
object is really removed from the list and the klist_node is ready to
be re-used, because the object could be added back to that (or another)
list. Since klist_node is embedded in another object, whether it's
being freed is the property of the lifetime rules of that embedding
object, which are not subordinate to the klist lifetime rules (although
the klist usually takes a reference on the embedding object).

This means that when you pass an object to a caller(in this case, the
bus_find_device), you pass it with an incremented refcount on the
embedding object, which is what the caller cares about. What happens
to the klist_node is entirely internal to the callee subsystem. So you
never have to worry about the klist_node being freed, because it's
embedded in the object the caller holds a reference to and thus can't
be freed.

However, because the upper layer is unaware of what's going on inside
the lower subsystem, the object may be removed from the list, which
means the internal klist_node n_ref may go to zero. The caller still
holds a reference on the embedding object, so the object itself isn't
going to be freed, but n_ref == 0 means it is no longer on any internal
list. This is the case we get in the SCSI problem. When you pass the
object back to bus_find_device() because it isn't on the klist anymore,
we can't begin with it as the starting place for the iterator. That's
what the patch is fixing.

> So it smells to me like if the kref can go down to zero, the caller
> is basically passing in a random pointer.
>
> Please make me feel better about my pull. I need a virtual hug.

This being Valentine's day, I'm fresh out of hugs, but my dog sends you
a lick.

> (Also, rather than assigning i_dur twice like this:
>
> + i->i_cur = NULL;
> + if (n && kref_get_unless_zero(&n->n_ref))
> + i->i_cur = n;
>
> I think it would have been cleaner to [in]validate "n" first (perhaps
> with a comment about _why_ that is needed yet safe):
>
> + if (n && !kref_get_unless_zero(&n->n_ref))
> + n = NULL;
>
> and then just do a simple:
>
> + i->i_cur = n;
>
> afterwards).
>
> But I care less about that small syntactic issue than I care about
> understanding why it's safe to pass around a klist_node that might
> not exist any more.

Yes, that looks fine too. I was basically assuming the compiler would
optimise away the double setting of i->i_cur.

James