Re: O_NONBLOCK is broken
From: Denys Vlasenko
Date: Sun Aug 19 2007 - 08:51:24 EST
On Tuesday 14 August 2007 22:59, David Schwartz wrote:
> > The problem is, O_NONBLOCK flag is not attached to file *descriptor*,
> > but to a "file description" mentioned in fcntl manpage:
>
> [snip]
>
> > We don't know whether our stdout descriptor #1 is shared with
> > anyone or not,
> > and if we were started from shell, it typically is. That's why we try to
> > restore flags ASAP.
> >
> > But "ASAP" isn't soon enough. Between setting and clearing O_NONBLOCK,
> > other process which share fd #1 with us may well be affected
> > by file suddenly becoming O_NONBLOCK under its feet.
> >
> > Worse, other process can do the same
> > fcntl(1, F_SETFL, fl | O_NONBLOCK);
> > ...
> > fcntl(1, F_SETFL, fl);
> > sequence, and first fcntl can return flags with O_NONBLOCK set
> > (because of
> > us), and then second fcntl will set O_NONBLOCK permanently, which is not
> > what was intended!
>
> [snip]
>
> > P.S. Hmm, it seems fcntl GETFL/SETFL interface seems to be racy:
> >
> > int fl = fcntl(fd, F_GETFL, 0);
> > /* other process can muck with file flags here */
> > fcntl(fd, F_SETFL, fl | SOME_BITS);
> >
> > How can I *atomically* add or remove bits from file flags?
>
> Simply put, you cannot change file flags on a shared descriptor. It is a
> bug to do so, a bug that is sadly present in many common programs.
It means that the design is flawed and if done right, file flags
which are changeable by fcntl (O_NONBLOCK, O_APPEND, O_ASYNC, O_DIRECT,
O_NOATIME) shouldn't be shared, they are useless as shared.
IOW, they should be file _descriptor_ flags.
It's unlikely that kernel tribe leaders will agree to violate POSIX
and make fcntl(F_SETFL) be per-fd thing. There can be users of this
(mis)feature.
Making fcntl(F_SETFD) accept those same flags and making it override
F_SETFL flags may fare slightly better, but may require propagation
of these flags into *a lot* of kernel codepaths.
> I like the idea of being able to specify blocking or non-blocking behavior
> in the operation. It is not too uncommon to want to perform blocking
> operations sometimes and non-blocking operations other times for the same
> object and having to keep changing modes, even if it wasn't racy, is a
> pain.
I am submitting a patch witch allows this. Let's see what people will say.
Yet another way to fix this problem is to add a new fcntl operation
"duplicate an open file":
fd = fcntl(fd, F_DUPFL, min_fd);
which is analogous to F_DUPFD, but produces _unshared_ file descriptor.
You can F_SETFL it as you want, no one else will be affected.
> However, there's a much more fundamental problem here. Processes need a
> good way to get exclusive use of their stdin, stdout, and stderr streams
> and there is no good way. Perhaps an "exclusive lock" that blocked all
> other process' attempts to use the terminal until it was released would be
> a good thing.
Yep, maybe. But this is a different problem.
IOW: there are cases where one doesn't want this kind of locking,
but simply needs to do unblocked read/write. That's what I'm trying
to solve.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/