RE: [RFC PATCH v2 00/13] Add futex2 syscall

From: David Laight
Date: Mon Mar 08 2021 - 12:34:05 EST




From: Zebediah Figura
> Sent: 08 March 2021 16:18
>
> On 3/3/21 6:42 PM, André Almeida wrote:
> > ** The wait on multiple problem
> >
> > The use case lies in the Wine implementation of the Windows NT interface
> > WaitMultipleObjects. This Windows API function allows a thread to sleep
> > waiting on the first of a set of event sources (mutexes, timers, signal,
> > console input, etc) to signal. Considering this is a primitive
> > synchronization operation for Windows applications, being able to quickly
> > signal events on the producer side, and quickly go to sleep on the
> > consumer side is essential for good performance of those running over Wine.
>
> It's probably worth pointing out, for better or for worse, while this is
> *a* use case, it's also limited to an out-of-tree patch set/forked
> versions of Wine. I'm currently working on a different approach that
> should be upstreamable to Wine proper, as detailed in [1].
>
> [1]
> https://lore.kernel.org/lkml/f4cc1a38-1441-62f8-47e4-0c67f5ad1d43@xxxxxxxxxxxxxxx/

* NtPulseEvent can't work right. We badly emulate it by setting and then
immediately resetting the event, but due to the above gap between poll()
and read(), most threads end up missing the wakeup anyway.

As you stated later PulseEvent() is completely broken anyway.
At least one of the problems is that in order to complete an async io
(and all io is async) to final 'copy_to_user' must be done in the
context of the initiating thread.
So if the thread is in WaitMultipleObjects (it usually is) and an async io
completes (eg receive data on a TCP connection) the thread stops waiting
while the io completion callback is done.
If a pulseEvent happens during that window then it is lost.

Mind you there was (maybe is still) a bug in WMO on 64bit windows
that means the process completely misses io completion callbacks
if (I think) they happen while the process is being scheduled.
There is a loop in WMO - that fails to recover because interrupts
are disabled and a 30 second timer that unblocks things.
I had to add code to write to the ioapic to request the hardware
interrupt to unblock everything :-)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)