Re: [PATCH 08/29] drivers, md: convert mddev.active from atomic_t to refcount_t

From: James Bottomley
Date: Tue Mar 14 2017 - 10:59:38 EST


On Tue, 2017-03-14 at 12:29 +0000, Reshetova, Elena wrote:
> > Elena Reshetova <elena.reshetova@xxxxxxxxx> writes:
> >
> > > refcount_t type and corresponding API should be
> > > used instead of atomic_t when the variable is used as
> > > a reference counter. This allows to avoid accidental
> > > refcounter overflows that might lead to use-after-free
> > > situations.
> > >
> > > Signed-off-by: Elena Reshetova <elena.reshetova@xxxxxxxxx>
> > > Signed-off-by: Hans Liljestrand <ishkamiel@xxxxxxxxx>
> > > Signed-off-by: Kees Cook <keescook@xxxxxxxxxxxx>
> > > Signed-off-by: David Windsor <dwindsor@xxxxxxxxx>
> > > ---
> > > drivers/md/md.c | 6 +++---
> > > drivers/md/md.h | 3 ++-
> > > 2 files changed, 5 insertions(+), 4 deletions(-)
> >
> > When booting linux-next (specifically 5be4921c9958ec) I'm seeing
> > the
> > backtrace below. I suspect this patch is just exposing an existing
> > issue?
>
> Yes, we have actually been following this issue in the another
> thread.
> It looks like the object is re-used somehow, but I can't quite
> understand how just by reading the code.
> This was what I put into the previous thread:
>
> "The log below indicates that you are using your refcounter in a bit
> weird way in mddev_find().
> However, I can't find the place (just by reading the code) where you
> would increment refcounter from zero (vs. setting it to one).
> It looks like you either iterate over existing nodes (and increment
> their counters, which should be >= 1 at the time of increment) or
> create a new node, but then mddev_init() sets the counter to 1. "
>
> If you can help to understand what is going on with the object
> creation/destruction, would be appreciated!
>
> Also Shaohua Li stopped this patch coming from his tree since the
> issue was caught at that time, so we are not going to merge this
> until we figure it out.

Asking on the correct list (dm-devel) would have got you the easy
answer: The refcount behind mddev->active is a genuine atomic. It has
refcount properties but only if the array fails to initialise (in that
case, final put kills it). Once it's added to the system as a gendisk,
it cannot be freed until md_free(). Thus its ->active count can go to
zero (when it becomes inactive; usually because of an unmount). On a
simple allocation regardless of outcome, the last executed statement in
md_alloc is mddev_put(): that destroys the device if we didn't manage
to create it or returns 0 and adds an inactive device to the system
which the user can get with mddev_find().

James