Re: KASAN: use-after-free Read in __fput (3)

From: Al Viro
Date: Tue Sep 01 2020 - 08:37:52 EST


On Tue, Sep 01, 2020 at 04:43:09PM +0800, Hillf Danton wrote:

> Below is my 2c for making it safe to access the rbtree entry and the
> file tucked in it by bumping up the file count before adding epi into
> rbtree.

NAK. epitems, by design, do *NOT* pin files down. That's what
eventpoll_release() is about - those references are not counting and
epitems can be taken out by the final close.

It's a user-visible aspect of API. And the problem Marc's patch was
trying to solve had nothing to do with that - at the root it's
the lack of suitable exclusion and the atrocious way loop prevention
and reverse path count checks are done on watch insertion.

What happens there is that files that would have paths added by
EPOLL_CTL_ADD (including those via several intermediate epoll files)
are collected into a temporary list and then checked for excessive
reverse paths. The list is emptied before epoll_ctl() returns.

And that's _the_ list - if epoll_ctl decides to go there in the
first place, it grabs a system-wide mutex and holds it as long as
it's playing around. Everything would've worked, if not for the
fact that
* the bloody list goes through struct file instances. It's
a bad decision, for a lot of reasons.
* in particular, files can go away while epoll_ctl() is playing
around. The same system-wide mutex would've blocked their freeing (modulo
a narrow race in eventpoll_release()) *IF* they remained attached to
epitems. However, explicit EPOLL_CTL_DEL removing such a beast
in the middle of EPOLL_CTL_ADD checks will remove such epitems without
touching the same mutex.

Marc's patch tried to brute-force the protection of files in that
temporary list; what it had missed was the fact that a file on it could
have already been committed to getting killed - f_count already zero,
just hadn't gotten through the __fput() yet. In such cases we don't
want to do any reverse path checks for that sucker - it *is* going
away, so there's no point considering it. We can't blindly bump the
refcount, though, for obvious reasons.

That struct file can't get freed right under the code inserting it into
the list - the locks held at the moment (ep->mtx) are more than enough.
It's what can happen to it while on the list that is a problem.

Of course, as a long-term solution we want to have that crap with
temporary list going through struct file instances taken out and
moved to epitems themselves; the check does scan all epitems for
every file in the set and we bloody well could gather those into
the list at once. Then we'd only need to protect the list walking
vs. removals (with a spinlock, for all we care).

It's too invasive a change for -stable, though, and I'm still digging
through fs/eventpoll.c locking - it's much too convoluted and it
got no attention for a decade ;-/