Re: [PATCH 3/9] tools: convert comm_str.refcnt from atomic_t to refcount_t
From: Arnaldo Carvalho de Melo
Date: Thu Feb 23 2017 - 08:12:29 EST
Em Thu, Feb 23, 2017 at 09:16:07AM +0000, Reshetova, Elena escreveu:
> > Em Wed, Feb 22, 2017 at 07:20:45PM -0300, Arnaldo Carvalho de Melo
> > > > make: *** [install-bin] Error 2
> > > > make: Leaving directory '/home/acme/git/linux/tools/perf'
> > > > [acme@jouet linux]$
> > > > 3) not test building your patches :-\
> Sorry about compilation errors: I totally forgot that tools is not
> getting compiled automatically when you build the whole tree with all
> configs on, so these patches really slipped through untested.
np, each component in the tree has its own idiosyncrasies, its really
difficult to test something so sweeping like this change, thanks again
for doing this work, use-after-free is evil, we should do more things
like this :-)
> > > > I'll let this pass this time, minor, I am removing the now unused
> > > > comm_str__get() function.
> > > But it can't get unused, because the comm_str__findnew() may return an
> > > existing entry, that _needs_ to get its refcount bumped, that is the
> > > reason for this refcount to be there... reinstating it:
> True, we missed that it was reused behind the scenes. Your fix below
> does it correctly. The object resuse seems to be one of the main
> issues of this atomic_t to refcount_t conversions through the kernel.
> We have sooo many places where this happens (obvious and not so
> obvious ones) and every single of them would fail in run-time, unless
> we can modify the code not to do increments on zero.
Yeah, but this code wasn't using the right idiom, which is to, in a
"__findnew()" method to lock it and before dropping the lock to bump the
refcount of whatever is going to be returned, so that after the lock is
dropped we don't open a window where that object can get removed from
the rbtree and hit the bit bucket.
> > > #0 0x00007ffff522491f in raise () from /lib64/libc.so.6
> > > #1 0x00007ffff522651a in abort () from /lib64/libc.so.6
> > > #2 0x00007ffff5268200 in __libc_message () from /lib64/libc.so.6
> > > #3 0x00007ffff527188a in _int_free () from /lib64/libc.so.6
> > > #4 0x00007ffff52752bc in free () from /lib64/libc.so.6
> > > #5 0x000000000051125f in comm_str__put (cs=0x35038e0) at util/comm.c:20
> > > #6 0x00000000005115b3 in comm__free (comm=0x6f4ee90) at
> > > And this brings us to my learning experience, i.e. this should've been caught
> > > by this machinery, right? But that only if I leaked this object, right?
> > >
> > > I need to read more on this, that is for sure ;-)
> The way how current refcount_t implemented it would refuse to do any
> increments/decrements on zero, or increments/decrements on max values.
> Also, it should WARN about this cases so that people can trace the
> issue.
Humm, but the sequence is:
1. refcount_set(1)
2. recount_dec_and_test(1) -> 0
delete object, _free_ it
3. if there is any reference to it yet, without holding a refcount_inc()
obtained reference (a __get() method for the protected object) it may
well be pointing to something that was reused for another unrelated
object, so it can get a value != 0 and then the next
refcount_dec_and_test() will not catch it being zero as set in step #2.
To really catch this we would have to just not delete it when it
refcunt_dec_and_test(1) sets it to zero, i.e. _leak_ it, right?
I'll check some other conversion done in the kernel to see where I am
missing something...
> > For reference, this is the patch on top of this:
> >
> > +++ b/tools/perf/util/comm.c
> > @@ -13,6 +13,13 @@ struct comm_str {
<SNIP>
> This looks correct now.
Thanks for checking,
- Arnaldo