Re: [GIT PULL] ucount fix for v5.14-rc

From: Linus Torvalds
Date: Sat Aug 07 2021 - 22:05:49 EST


On Sat, Aug 7, 2021 at 6:45 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> Because that fix matches the syzbot reports I saw. It explains them
> 100%, and the fix is clearly the right thing.

Just to clarify: the exact symptoms of the race before that fix can
vary. The simplest form was the one I described:

> Thread (a) on CPU1: starting out _without_ a reference to the
> uncounts, look up entry under the lock, but don't increment the count,
> release lock.
>
> Thread (b) on CPU2: have a reference, do a put_ucounts(). Count goes
> to zero, take the lock, unhash it, free the entry
>
> Thread (a) continues, increments the count on a UAF entry, triggers KASAN.

because that one happens immediately. But a more likely one is
actually that the "Thread (a) continues" thing happens _before_ the
entry is free'd (but after thread (b) has decremented the count to
zero), and then what happens is that thread (a) doesn't actually
trigger UAF and a KASAN report at that point yet.

Instead, thread (a) will install the pointer to the - free'd - ucounts
somewhere, and the UAF will trigger only rather much later.

Which makes the KASAN reports much harder to read, of course, because
by the time you actually access the invalid uncounts pointer, the
original *cause* of the problem is gone.

So I described the "easy" case to see, not the only one that that the
bug in b6c336528926 ("Use atomic_t for ucounts reference counting")
would trigger.

But commit 345daff2e994 ("ucounts: Fix race condition between
alloc_ucounts and put_ucounts") really is the obvious fix for an
obvious bug.

Could there be _another_ bug? Sure. Fixing one bug - no matter how
obvious the fix - doesn't guarantee there isn't something else lurking
too. But if so, I haven't seen the syzbot reports for it.

Linus