Re: pipes && EPOLLET, again

From: Oleg Nesterov
Date: Tue Mar 04 2025 - 14:33:41 EST


On 03/04, Linus Torvalds wrote:
>
> On Tue, 4 Mar 2025 at 05:45, Oleg Nesterov <oleg@xxxxxxxxxx> wrote:
> >
> > Don't we need the trivial one-liner below anyway?
>
> See this email of mine:
>
> https://lore.kernel.org/all/CAHk-=wiCRwRFi0kGwd_Uv+Xv4HOB-ivHyUp9it6CNSmrKT4gOA@xxxxxxxxxxxxxx/
>
> and the last paragraph in particular.
>
> The whole "poll_usage" thing is a kernel *hack* to deal with broken
> user space that expected garbage semantics that aren't real, and were
> never real.

Yes agreed. But we can make this hack more understandable. But as I said,
this is off-topic right now.

> introduced that completely bogus hack to say "ok, we'll send these
> completely wrong extraneous events despite the fact that nothing has
> changed, because some broken user space program was written to expect
> them".

Yes, but since we already have this hack:

> That program is buggy, and we're not adding new hacks for new bugs.

Yes, but see below...

> If you ask for an edge-triggered EPOLL event, you get an *edge*
> triggered EPOLL event. And there is no edge - the pipe was already
> readable, no edge anywhere in sight.

Yes, the pipe was already readable before before fork, but this condition
was already "consumed" by the 1st epoll_wait(EPOLLET). Please see below.

> If anything, we might consider removing the crazy "poll_usage" hack in
> the (probably futile) hope that user space has been fixed.

This would be the best option ;) Until then:

I agree that my test case is "buggy", but afaics it is not buggier than
userspace programs which rely on the unconditional kill_fasync()'s in
pipe_read/pipe_write?

So. anon_pipe_write() does

if (was_empty)
wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);

before wait_event(pipe->wr_wait), but before return it does

if (was_empty || pipe->poll_usage)
wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);

and this looks confusing to me.

If pipe_write() doesn't take poll_usage into account before wait_event(wr_wait),
then it doesn't need kill_fasync() too?

So I won't argue, but why not make both cases more consistent?

Oleg.