Re: [PATCH v5] close_range.2: new page documenting close_range(2)
From: Stephen Kitt
Date: Wed Dec 23 2020 - 14:34:44 EST
Hi Michael,
On Tue, 22 Dec 2020 21:36:28 +0100, "Michael Kerrisk (man-pages)"
<mtk.manpages@xxxxxxxxx> wrote:
> On 12/21/20 8:46 PM, Stephen Kitt wrote:
[...]
> > +Errors closing a given file descriptor are currently ignored.
> > +.PP
> > +.I flags
> > +can be 0 or set to one or both of the following:
>
> Better, I think:
> "flags is a bit mask containing 0 or more of the following:"
Indeed, thanks!
> > +.TP
> > +.BR CLOSE_RANGE_CLOEXEC " (since Linux 5.10)"
>
> s/5.10/5.11/ ?
Oops, yes, 5.11.
> > +sets the close-on-exec bit instead of
>
> s/close-on-exec bit/file descriptor's close-on-exec flag/
Noted.
> > +immediately closing the file descriptors.
> > +.TP
> > +.B CLOSE_RANGE_UNSHARE
> > +unshares the range of file descriptors from any other processes,
> > +before closing them,
> > +avoiding races with other threads sharing the file descriptor table.
> > +.SH RETURN VALUE
> > +On success,
> > +.BR close_range ()
> > +returns 0.
> > +On error, \-1 is returned and
> > +.I errno
> > +is set to indicate the cause of the error.
> > +.SH ERRORS
> > +.TP
> > +.B EINVAL
> > +.I flags
> > +is not valid, or
> > +.I first
> > +is greater than
> > +.IR last .
> > +.PP
> > +The following can occur with
> > +.B CLOSE_RANGE_UNSHARE
> > +(when constructing the new descriptor table):
> > +.TP
> > +.B EMFILE
> > +The per-process limit on the number of open file descriptors has been
> > reached +(see the description of
> > +.B RLIMIT_NOFILE
> > +in
> > +.BR getrlimit (2)).
> > +.TP
> > +.B ENOMEM
> > +Insufficient kernel memory was available.
> > +.SH VERSIONS
> > +.BR close_range ()
> > +first appeared in Linux 5.9.
> > +.SH CONFORMING TO
> > +.BR close_range ()
> > +is a nonstandard function that is also present on FreeBSD.
> > +.SH NOTES
> > +Glibc does not provide a wrapper for this system call; call it using
> > +.BR syscall (2).
> > +.SS Closing all open file descriptors
> > +.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
> > +To avoid blindly closing file descriptors
> > +in the range of possible file descriptors,
> > +this is sometimes implemented (on Linux)
> > +by listing open file descriptors in
> > +.I /proc/self/fd/
> > +and calling
> > +.BR close (2)
> > +on each one.
> > +.BR close_range ()
> > +can take care of this without requiring
> > +.I /proc
> > +and within a single system call,
> > +which provides significant performance benefits.
> > +.SS Closing file descriptors before exec
> > +.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
> > +File descriptors can be closed safely using
> > +.PP
> > +.in +4n
> > +.EX
> > +/* we don't want anything past stderr here */
> > +close_range(3, ~0U, CLOSE_RANGE_UNSHARE);
> > +execve(....);
> > +.EE
> > +.in
> > +.PP
> > +.B CLOSE_RANGE_UNSHARE
> > +is conceptually equivalent to
> > +.PP
> > +.in +4n
> > +.EX
> > +unshare(CLONE_FILES);
> > +close_range(first, last, 0);
> > +.EE
> > +.in
> > +.PP
> > +but can be more efficient:
> > +if the unshared range extends past
> > +the current maximum number of file descriptors allocated
> > +in the caller's file descriptor table
> > +(the common case when
> > +.I last
> > +is ~0U),
> > +the kernel will unshare a new file descriptor table for the caller up to
> > +.IR first .
> > +This avoids subsequent close calls entirely;
>
> s/close/.BR close (2)/
Noted.
> > +the whole operation is complete once the table is unshared.
> > +.SS Closing files on \fBexec\fP
> > +.\" 582f1fb6b721facf04848d2ca57f34468da1813e
> > +This is particularly useful in cases where multiple
> > +.RB pre- exec
> > +setup steps risk conflicting with each other.
> > +For example, setting up a
> > +.BR seccomp (2)
> > +profile can conflict with a
> > +.BR close_range ()
> > +call:
> > +if the file descriptors are closed before the
> > +.BR seccomp (2)
> > +profile is set up,
> > +the profile setup can't use them itself,
> > +or control their closure;
> > +if the file descriptors are closed afterwards,
> > +the seccomp profile can't block the
> > +.BR close_range ()
> > +call or any fallbacks.
> > +Using
> > +.B CLOSE_RANGE_CLOEXEC
> > +avoids this:
> > +the descriptors can be marked before the
> > +.BR seccomp (2)
> > +profile is set up,
> > +and the profile can control access to
> > +.BR close_range ()
> > +without affecting the calling process.
> > +.SH EXAMPLES
> > +The following program is designed to be execed by the second program
> > +below.
>
> I have some specific comments below, but a more general comment
> to start with: why use two programs here? It seems to add complexity
> without demonstrating anything that couldn't also be demonstrated
> with a simpler single program, or have I missed something?
I based the example on the test code in the kernel and the examples from
execve(2), since close_range(2) is mostly useful in preparation for an
execve(2) call.
> > +It lists its open file descriptors:
> > +.PP
> > +.in +4n
> > +.EX
> > +/* listopen.c */
> > +
> > +#include <stdio.h>
> > +#include <sys/stat.h>
> > +
> > +int
> > +main(int argc, char *argv[])
> > +{
> > + struct stat buf;
> > +
> > + for (int i = 0; i < 100; i++) {
> > + if (!fstat(i, &buf))
>
> I kind of prefer "fstat(...) == 0"
Ah yes, that makes sense.
> > + printf("FD %d is open.\en", i);
> > + }
> > +
> > + exit(EXIT_SUCCESS);
> > +)
> > +.EE
> > +.in
> > +.PP
> > +This program executes the command given on its command-line,
> > +after opening the files listed after the command
> > +and then using
> > +.BR close_range ()
> > +to close them:
> > +.PP
> > +.in +4n
> > +.EX
> > +/* close_range.c */
> > +
> > +#include <fcntl.h>
> > +#include <linux/close_range.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <sys/stat.h>
> > +#include <sys/syscall.h>
> > +#include <sys/types.h>
> > +#include <unistd.h>
> > +
> > +int
> > +main(int argc, char *argv[])
> > +{
> > + char *newargv[] = { NULL };
> > + char *newenviron[] = { NULL };
> > +
> > + if (argc < 3) {
> > + fprintf(stderr, "Usage: %s <command-to-run> <files-to-open>\en",
> > argv[0]);
>
> Line too long. Please break it up so that it renders well on
> an 80-column terminal.
>
> Or, alternatively:
>
> fprintf(stderr, "Usage: %s <command> <file>...\en", argv[0]);
Noted.
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + for (int i = 2; i < argc; i++) {
> > + if (open(argv[i], O_RDONLY) == -1) {
> > + perror(argv[i]);
> > + exit(EXIT_FAILURE);
> > + }
> > + }
> > +
> > + if (syscall(__NR_close_range, 3, ~0U, CLOSE_RANGE_UNSHARE) == -1) {
>
> Line too long.
>
> Alternatively, what about s/CLOSE_RANGE_UNSHARE/0/? Or it
> considered best practice to always use CLOSE_RANGE_UNSHARE?
In this particular context it’s not required, but I would argue that it’s
better to use _UNSHARE in general — it avoids the risk of forgetting to add
it in long-lived code which ends up using threads at some point...
> > + perror("close_range");
> > + exit(EXIT_FAILURE);
> > + }
> > +
> > + execve(argv[1], newargv, newenviron);
> > + perror("execve");
> > + exit(EXIT_FAILURE);
> > +}
> > +.EE
> > +.in
> > +.PP
> > +We can use the second program to exec the first as follows:
> > +.PP
> > +.in +4n
> > +.EX
> > +.RB "$" " make listopen close_range"
>
> Perhaps we don't really need the preceding line?
I was following the examples in execve(2) which show how to build the
programs.
> > +.RB "$" " ./close_range ./listopen /dev/null /dev/zero"
> > +FD 0 is open.
> > +FD 1 is open.
> > +FD 2 is open.
> > +.EE
> > +.in
> > +.PP
> > +Removing the call to
> > +.BR close_range ()
> > +will show different output,
> > +with the file descriptors for the named files still open.
> > +.SH SEE ALSO
> > +.BR close (2)
> >
> > base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55
Thanks,
Stephen
Attachment:
pgp5KHGhC3NSl.pgp
Description: OpenPGP digital signature