Re: Regression: epoll edge-triggered (EPOLLET) for pipes/FIFOs

From: Michael Kerrisk (man-pages)
Date: Mon Oct 12 2020 - 16:30:59 EST

Next message: Willy Tarreau: "[GIT PULL] prandom32 changes for v5.10"
Previous message: Kees Cook: "Re: [tip: x86/entry] x86/entry: Convert Divide Error to IDTENTRY"
In reply to: Linus Torvalds: "Re: Regression: epoll edge-triggered (EPOLLET) for pipes/FIFOs"
Next in thread: Linus Torvalds: "Re: Regression: epoll edge-triggered (EPOLLET) for pipes/FIFOs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[CC += Davide]

Hello Linus,

Thanks for your quick reply.

On 10/12/20 9:25 PM, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 11:40 AM Michael Kerrisk (man-pages)
> <mtk.manpages@xxxxxxxxx> wrote:
>>
>> Between Linux 5.4 and 5.5 a regression was introduced in the operation
>> of the epoll EPOLLET flag. From some manual bisecting, the regression
>> appears to have been introduced in
>>
>> commit 1b6b26ae7053e4914181eedf70f2d92c12abda8a
>> Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>> Date: Sat Dec 7 12:14:28 2019 -0800
>>
>> pipe: fix and clarify pipe write wakeup logic
>>
>> (I also built a kernel from the immediate preceding commit, and did
>> not observe the regression.)
>
> So the difference from that commit is that now we only wake up a
> reader of a pipe when we add data to it AND IT WAS EMPTY BEFORE.
>
>> The aim of ET (edge-triggered) notification is that epoll_wait() will
>> tell us a file descriptor is ready only if there has been new activity
>> on the FD since we were last informed about the FD. So, in the
>> following scenario where the read end of a pipe is being monitored
>> with EPOLLET, we see:
>>
>> [Write a byte to write end of pipe]
>> 1. Call epoll_wait() ==> tells us pipe read end is ready
>> 2. Call epoll_wait() [again] ==> does not tell us that the read end of
>> pipe is ready
>
> Right.
>
>> If we go further:
>>
>> [Write another byte to write end of pipe]
>> 3. Call epoll_wait() ==> tells us pipe read end is ready
>
> No.
>
> The "read end" readiness has not changed. It was ready before, it's
> ready now, there's no change in readiness.
>
> Now, the old pipe behavior was that it would wake up writers whether
> they needed it or not, so epoll got woken up even if the readiness
> didn't actually change.
>
> So we do have a change in behavior.
>
> However, clearly your test is wrong, and there is no edge difference.
>
> Now, if this is more than just a buggy test - and it actually breaks
> some actual application and real behavior - we'll need to fix it. A
> regression is a regression, and we'll need to be bug-for-bug
> compatible for people who depended on bugs.

I don't think this is correct. The epoll(7) manual page
sill carries the text written long ago by Davide Libenzi,
the creator of epoll:

Since even with edge-triggered epoll, multiple events can be gen‐
erated upon receipt of multiple chunks of data, the caller has the
option to specify the EPOLLONESHOT flag, to tell epoll to disable
the associated file descriptor after the receipt of an event with
epoll_wait(2).

My reading of that text is that in the scenario that I describe a
readiness notification should be generated at step 3 (and indeed
should be generated whenever additional data bleeds into the channel).
Indeed, the very rationale for the existence of the EPOLLONESHOT flag
is to *prevent* notifications in such circumstances. And, as I noted,
sockets and terminals do (still) behave in the way that I expect in
this scenario.

So, I don't think this is a buggy test. It (still) appears to me
that this is a breakage of intended and documented behavior.
(Whether it breaks some actual application, I do not know. But
I have also seen that sometimes reports of such breakages take
a very time to come in.)

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Next message: Willy Tarreau: "[GIT PULL] prandom32 changes for v5.10"
Previous message: Kees Cook: "Re: [tip: x86/entry] x86/entry: Convert Divide Error to IDTENTRY"
In reply to: Linus Torvalds: "Re: Regression: epoll edge-triggered (EPOLLET) for pipes/FIFOs"
Next in thread: Linus Torvalds: "Re: Regression: epoll edge-triggered (EPOLLET) for pipes/FIFOs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]