Re: [PATCH 5/7] userfaultfd: switch to exclusive wakeup for blocking reads

From: Andrea Arcangeli
Date: Tue Jun 16 2015 - 08:18:15 EST

On Mon, Jun 15, 2015 at 08:41:24PM -1000, Linus Torvalds wrote:
> On Mon, Jun 15, 2015 at 12:19 PM, Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote:
> >
> > Yes, it would leave the other blocked, how is it different from having
> > just 1 reader and it gets killed?
> Either is completely wrong. But the read() code can at least see that
> "I'm returning early due to a signal, so I'll wake up any other
> waiters".
> Poll simply *cannot* do that. Because by definition poll always
> returns without actually clearing the thing that caused the wakeup.
> So for "poll()", using exclusive waits is wrong very much by
> definition. For read(), you *can* use exclusive waits correctly, it
> just requires you to wake up others if you don't read all the data
> (either due to being killed by a signal, or because the read was
> incomplete).

There's no interface to do wakeone with poll so I haven't thought much
about it frankly but intuitively it didn't look radically different as
long as poll checks every fd revent it gets. If I was to patch it to
introduce wakeone in poll I'd think more about it of course. Perhaps
I've been overoptimistic here.

> What does qemu have to do with anything?
> We don't implement kernel interfaces that are broken, and that can
> leave processes blocked when they shouldn't be blocked. We also don't
> implement kernel interfaces that only work with one program and then
> say "if that program is broken, it's not our problem".

I'm testing with the stresstest application not with qemu, qemu cannot
take advantage of this anyway because it uses a single thread so far
and it uses poll not blocking reads. The stresstest suite listens to
events with one thread per CPU and it interleaves poll usage with
blocking reads at every bounce, and it's working correctly so far.

> > I'm not saying doing wakeone is easy [...]
> Bullshit, Andrea.
> That's *exactly* what you said in the commit message for the broken
> patch that I complained about. And I quote:

Please don't quote me out of context, and quote me in full if you
quote me:

"I'm not saying doing wakeone is easy and it's enough to flip a switch
everywhere to get it everywhere"

In the above paragraph (that you quoted in a truncated version of it)
I was talking in general, not specific to userfaultfd. I meant in
general wakeone is not easy.

> patch that I complained about. And I quote:
> "Blocking reads can easily use exclusive wakeups. Poll in theory
> could too but there's no poll_wait_exclusive in common code yet"

With "I'm not saying doing wakeone is easy and it's enough to flip a
switch everywhere to get it everywhere" I intended exactly to clarify
that "Blocking reads can easily use exclusive wakeups" was in the
context of userfaultfd only.

With regard to the patch you still haven't told what exactly what
runtime breakage I shall expect from my simple change. The case you
mentioned about a thread that gets killed is irrelevant because an
userfault would get missed anyway, if a task listening to userfaultfd
get killed after it received any uffd_msg structure. Wakeone or
wakeall won't move the needle for that case. There's no broadcast of
userfaults to all readers, even with wakeall, only the first reader
that wakes up, gets the messages, the others returns to sleep
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at