Re: [PATCH 2/2] eventpoll: Fix epoll_wait() report false negative

Next message: Mike Rapoport: "Re: [PATCH 10/49] mm: move subsection_map_init() into sparse_init()"
Previous message: Mike Rapoport: "Re: [PATCH 09/49] mm: panic on memory allocation failure in sparse_init_nid()"
Next in thread: Nam Cao: "Re: [PATCH 2/2] eventpoll: Fix epoll_wait() report false negative"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Christian Brauner

Date: Wed Apr 29 2026 - 02:55:00 EST

On Fri, Jul 18, 2025 at 10:59:48AM +0200, Nam Cao wrote:
> On Fri, Jul 18, 2025 at 09:38:27AM +0100, Soheil Hassas Yeganeh wrote:
> > On Fri, Jul 18, 2025 at 8:52 AM Nam Cao <namcao@xxxxxxxxxxxxx> wrote:
> > >
> > > ep_events_available() checks for available events by looking at ep->rdllist
> > > and ep->ovflist. However, this is done without a lock, therefore the
> > > returned value is not reliable. Because it is possible that both checks on
> > > ep->rdllist and ep->ovflist are false while ep_start_scan() or
> > > ep_done_scan() is being executed on other CPUs, despite events are
> > > available.
> > >
> > > This bug can be observed by:
> > >
> > > 1. Create an eventpoll with at least one ready level-triggered event
> > >
> > > 2. Create multiple threads who do epoll_wait() with zero timeout. The
> > > threads do not consume the events, therefore all epoll_wait() should
> > > return at least one event.
> > >
> > > If one thread is executing ep_events_available() while another thread is
> > > executing ep_start_scan() or ep_done_scan(), epoll_wait() may wrongly
> > > return no event for the former thread.
> >
> > That is the whole point of epoll_wait with a zero timeout. We would want to
> > opportunistically poll without much overhead, which will have more
> > false positives.
> > A caller that calls with a zero timeout should retry later, and will
> > at some point observe the event.
>
> Is this a documented behavior that users expect? I do not see this in the
> man page.

The selftests rely on this behavior that timeout=0 sees events from a
concurrently running producer. They would fail at a very higher rate
after this change - believe me I had a similar patch that changed
something in this area. I would explore the seqcount that Mateusz
suggested tbh.