Re: [PATCH] open: introduce O_NOSTD

From: Eric Blake
Date: Thu Aug 27 2009 - 10:00:02 EST


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Davide Libenzi on 8/25/2009 3:53 PM:
>> Another solution is for the application to sanitize all newly-created
>> fds: GNU coreutils provides a wrapper open_safer, which does nothing
>> extra in the common case that open() returned 3 or larger, but calls
>> fcntl(n,F_DUPFD,3)/close(n) before returning if n was less than 3.
>> However, this leads to triple the syscall cost for every open() call
>> if the process starts life with a std fd closed; and if O_CLOEXEC is
>> not used, still leaves a window of time where the fd can be leaked
>> through another thread's use of fork/exec.
>
> I think we can say that the vast majority of the software is not going to
> notice the proposed open_safer(), performance-wise, since the first three
> fds are always filled. So IMO the performance impact argument is a weak one.
> If CLOEXEC semantics are needed in the open operation, F_DUPFD_CLOEXEC can
> be used to match it.

The current gnulib implementation of open_safer (pre-O_CLOEXEC support) is
(roughly):

/* Wrap open, to guarantee that a successful return value is >= 3. */
int open_safer (const char *name, int flags, int mode)
{
int fd = open (name, flags, mode);
if (0 <= fd && fd <= 2)
{
int dup = fcntl (fd, F_DUPFD, 3);
int saved_errno = errno;
close (fd);
errno = saved_errno;
fd = dup;
}
return fd;
}

which has the desired property of no overhead in the common case of all
standard fds open. But it obviously mishandles the O_CLOEXEC flag.
Here's a first cut at supporting it:

int open_safer (const char *name, int flags, int mode)
{
int fd = open (name, flags, mode);
if (0 <= fd && fd <= 2)
{
int dup = fcntl (fd, ((flags & O_CLOEXEC)
? F_DUPFD_CLOEXEC : F_DUPFD), 3);
int saved_errno = errno;
close (fd);
errno = saved_errno;
fd = dup;
}
return fd;
}

If the user requested open_safer(O_CLOEXEC), then we still have the
desired property of no syscall overhead and no fd leak. But if the user
intentionally does not pass O_CLOEXEC because they wanted to create an
inheritable fd, but without stomping on standard fds, then this version
still has a window for an fd leak. So let's try this version, which
guarantees no fd leak, while still keeping the semantics of giving the
user an inheritable fd outside the std fd range:

int open_safer (const char *name, int flags, int mode)
{
int fd = open (name, flags | O_CLOEXEC, mode);
if (0 <= fd && fd <= 2)
{
int dup = fcntl (fd, ((flags & O_CLOEXEC)
? F_DUPFD_CLOEXEC : F_DUPFD), 3);
int saved_errno = errno;
close (fd);
errno = saved_errno;
fd = dup;
}
else if (!(flags & O_CLOEXEC))
{
if ((flags = fcntl (fd, F_GETFD)) < 0
|| fcntl (fd, F_SETFD, flags & ~FD_CLOEXEC) == -1)
{
int saved_errno = errno;
close (fd);
fd = -1;
errno = saved_errno;
}
}
return fd;
}

This solves the fd leak, and open_safer(O_CLOEXEC) is still cheap in the
common case. But now the case of open_safer without O_CLOEXEC costs 3
syscalls, regardless of whether the standard fds were already open (if we
assumed there were no possibility of new FD_* flags, we could cut the
common-case penalty from three to two by skipping the fcntl(fd,F_GETFD)
and just using fcntl(fd,F_SETFD,0), but that's asking for problems).

> While the patch is simple, IMO this is something that can be easily taken
> care in glibc layers w/out huge drawbacks.

I hope that my example shows why doing it in the kernel is desirable -
there is no safe way to keep the pre-O_CLOEXEC efficiency using just the
library, but there IS a way to do it with kernel support:

int open_safer (const char *name, int flags, int mode)
{
return open (name, flags | O_NOSTD, mode);
}

Also, remember this statement from Ulrich in the series that first
introduced O_CLOEXEC/O_NONBLOCK support across all the fd creation APIs:

"In future there will be other uses. Like a O_* flag to enable the use
of non-sequential descriptors."
http://marc.info/?l=linux-kernel&m=121010907127190&w=2

- --
Don't work too hard, make some time for fun as well!

Eric Blake ebb9@xxxxxxx
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqWj/gACgkQ84KuGfSFAYClIgCdEw6/7+PiFhR7aKGyuLUd5RdR
lbAAmgKLqCC5zlhkOm/x4K+Om7nqeD0i
=Ibq4
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/