Re: [syzbot] KASAN: null-ptr-deref Read in filp_close (2)

From: Christian Brauner
Date: Mon Mar 29 2021 - 05:22:45 EST


On Sat, Mar 27, 2021 at 11:33:37PM +0000, Al Viro wrote:
> On Fri, Mar 26, 2021 at 02:50:11PM +0100, Christian Brauner wrote:
> > @@ -632,6 +632,7 @@ EXPORT_SYMBOL(close_fd); /* for ksys_close() */
> > static inline void __range_cloexec(struct files_struct *cur_fds,
> > unsigned int fd, unsigned int max_fd)
> > {
> > + unsigned int cur_max;
> > struct fdtable *fdt;
> >
> > if (fd > max_fd)
> > @@ -639,7 +640,12 @@ static inline void __range_cloexec(struct files_struct *cur_fds,
> >
> > spin_lock(&cur_fds->file_lock);
> > fdt = files_fdtable(cur_fds);
> > - bitmap_set(fdt->close_on_exec, fd, max_fd - fd + 1);
> > + /* make very sure we're using the correct maximum value */
> > + cur_max = fdt->max_fds;
> > + cur_max--;
> > + cur_max = min(max_fd, cur_max);
> > + if (fd <= cur_max)
> > + bitmap_set(fdt->close_on_exec, fd, cur_max - fd + 1);
> > spin_unlock(&cur_fds->file_lock);
> > }
>
> Umm... That's harder to follow than it ought to be. What's the point of
> having
> max_fd = min(max_fd, cur_max);
> done in the caller, anyway? Note that in __range_close() you have to
> compare with re-fetched ->max_fds (look at pick_file()), so...

Yeah, I'll massage that patch a bit. I wanted to know whether this fixes
the issue first though.

>
> BTW, I really wonder if the cost of jerking ->file_lock up and down
> in that loop in __range_close() is negligible. What values do we

Just for the record, I remember you pointing at that originally. Linus
argued that this likely wasn't going to be a problem and that if people
see performance hits we'll optimize.

> typically get from callers and how sparse does descriptor table tend
> to be for those?

Weirdly, I can actually somewhat answer that question since I tend to
regularly "survey" large userspace projects I know or am involved in
that adopt new APIs we added just to see how they use it.

A few users:
1. crun
https://github.com/containers/crun/blob/a1c0ef1b886ca30c2fb0906c7c43be04b555c52c/src/libcrun/utils.c#L1490
ret = syscall_close_range (n, UINT_MAX, CLOSE_RANGE_CLOEXEC);

2. LXD
https://github.com/lxc/lxd/blob/f12f03a4ba4645892ef6cc167c24da49d1217b02/lxd/main_forkexec.go#L293
ret = close_range(EXEC_PIPE_FD + 1, UINT_MAX, CLOSE_RANGE_UNSHARE);

3. LXC
https://github.com/lxc/lxc/blob/1718e6d6018d5d6072a01d92a11d5aafc314f98f/src/lxc/rexec.c#L165
ret = close_range(STDERR_FILENO + 1, MAX_FILENO, CLOSE_RANGE_CLOEXEC);

Of these three 1. and 3. don't matter because they rely on
CLOSE_RANGE_CLOEXEC and exec.
For 2. I can say that the fdtable is likely going to be sparse.
close_range() here is basically used to prevent accidental fd leaks
across an exec. So 2. should never have more > 4 file. In fact, this
could and should probably be switched to CLOSE_RANGE_CLOEXEC too.

The next two cases might be more interesting:

4. systemd
- https://github.com/systemd/systemd/blob/fe96c0f86d15e844d74d539c6cff7f971078cf84/src/basic/fd-util.c#L228
close_range(3, -1, 0)
- https://github.com/systemd/systemd/blob/fe96c0f86d15e844d74d539c6cff7f971078cf84/src/basic/fd-util.c#L271
https://github.com/systemd/systemd/blob/fe96c0f86d15e844d74d539c6cff7f971078cf84/src/basic/fd-util.c#L288
/* Close everything between the start and end fds (both of which shall stay open) */
if (close_range(start + 1, end - 1, 0) < 0) {
if (close_range(sorted[n_sorted-1] + 1, -1, 0) >= 0)

5. Python
https://github.com/python/cpython/blob/9976834f807ea63ca51bc4f89be457d734148682/Python/fileutils.c#L2250

systemd has the regular case that others have too where it simply closes
all fds over 3 and it also has the more complicated case where it has an
ordered array of fds closing up to the lower bound and after the upper
bound up to the maximum. PID 1 can have a large number of fds open
because of socket activation so here close_range() will encounter less
sparse fd tables where it needs to close a lot of fds.

For Python's os.closerange() implementation which depends on our syscall
it's harder to say given that this will be used by a lot of projects but
I would _guess_ that if people use closerange() they do so because they
actually have something to close.

In short, I would think that close_range() without the
CLOSE_RANGE_CLOEXEC feature will usually be used in scenarios where
there's work to be done, i.e. where the caller likely knows that they
might inherit a non-trivial number of file descriptors (usually after a
fork) that they want to close and they want to do it either because they
don't exec or they don't know when they'll exec. All others I'd expect
to switch to CLOSE_RANGE_CLOEXEC on kernels where it's supported.

Christian