Re: [PATCH] pidns: remove recursion from free_pid_ns (v3)

From: Greg KH
Date: Wed Oct 10 2012 - 03:49:26 EST


On Tue, Oct 09, 2012 at 12:08:31PM -0700, Andrew Morton wrote:
> On Tue, 9 Oct 2012 12:03:00 -0700
> Greg KH <greg@xxxxxxxxx> wrote:
>
> > On Tue, Oct 09, 2012 at 11:48:21AM -0700, Andrew Morton wrote:
> > > On Sat, 6 Oct 2012 23:56:33 +0400
> > > Andrew Vagin <avagin@xxxxxxxxxx> wrote:
> > >
> > > > Here is a stack trace of recursion:
> > > > free_pid_ns(parent)
> > > > put_pid_ns(parent)
> > > > kref_put(&ns->kref, free_pid_ns);
> > > > free_pid_ns
> > > >
> > > > This patch turns recursion into loops.
> > > >
> > > > pidns can be nested many times, so in case of recursion
> > > > a simple user space program can provoke a kernel panic
> > > > due to exceed of a kernel stack.
> > >
> > > So we should backport this into earlier kernels.
> > >
> > > > --- a/include/linux/kref.h
> > > > +++ b/include/linux/kref.h
> > > > @@ -95,6 +95,18 @@ static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref)
> > > > return kref_sub(kref, 1, release);
> > > > }
> > > >
> > > > +/**
> > > > + * kref_put - decrement refcount for object.
> > > > + * @kref: object.
> > > > + *
> > > > + * Decrement the refcount.
> > > > + * Return 1 if refcount is zero.
> > > > + */
> > > > +static inline int __kref_put(struct kref *kref)
> > > > +{
> > > > + return atomic_dec_and_test(&kref->refcount);
> > > > +}
> > >
> > > Greg might be interested in this.
> > >
> > > It's a pretty specialised thing and perhaps it needs some stern words
> > > in the description explaining when and why it should and shouldn't be
> > > used.
> > >
> > > I wonder if people might (ab)use this to avoid the "doesn't
> > > have a release function" warning.
> >
> > Yes they would, please don't do this at all.
> >
> > In fact, why is it needed? It doesn't solve anything (if it does,
> > something in the way the kref is being used is wrong.)
> >
>
> It's right there in the changelog. The patch fixes deep
> kref_put->release->kref_put recursion by turning the operation for
> pidns into a loop.

But why would a kref release function ever decrement the same kref
again causing a loop in the first place?

That's what I was referring to. This strongly sounds like a problem in
how the kref is being used, not in the kref code itself.

Is a kref even the correct thing here?

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/