Re: Change in functionality of futex() system call.

From: Andrew Lutomirski
Date: Tue Jun 07 2011 - 15:53:39 EST


On Tue, Jun 7, 2011 at 3:33 PM, David Oliver <david@xxxxxxxxxxxxxxx> wrote:
> On Tue, Jun 7, 2011 at 2:19 PM, Andrew Lutomirski <luto@xxxxxxx> wrote:
>> On Tue, Jun 7, 2011 at 3:10 PM, David Oliver <david@xxxxxxxxxxxxxxx> wrote:
>>> On Tue, Jun 7, 2011 at 1:43 PM, Andrew Lutomirski <luto@xxxxxxx> wrote:
>>>> On Tue, Jun 7, 2011 at 11:58 AM, Eric Dumazet <eric.dumazet@xxxxxxxxx> wrote:
>>>>> Le mardi 07 juin 2011 à 10:44 -0400, Andy Lutomirski a écrit :
>>>>>> On 06/06/2011 11:13 PM, Darren Hart wrote:
>>>>>> >
>>>>>> >
>>>>>> > On 06/06/2011 11:11 AM, Eric Dumazet wrote:
>>>>>> >> Le lundi 06 juin 2011 à 10:53 -0700, Darren Hart a écrit :
>>>>>> >>>
>>>>>> >>
>>>>>> >>> If I understand the problem correctly, RO private mapping really doesn't
>>>>>> >>> make any sense and we should probably explicitly not support it, while
>>>>>> >>> special casing the RO shared mapping in support of David's scenario.
>>>>>> >>>
>>>>>> >>
>>>>>> >> We supported them in 2.6.18 kernels, apparently. This might sounds
>>>>>> >> stupid but who knows ?
>>>>>> >
>>>>>> >
>>>>>> > I guess this is actually the key point we need to agree on to provide a
>>>>>> > solution. This particular case "worked" in 2.6.18 kernels, but that
>>>>>> > doesn't necessarily mean it was supported, or even intentional.
>>>>>> >
>>>>>> > It sounds to me that we agree that we should support RO shared mappings.
>>>>>> > The question remains about whether we should introduce deliberate
>>>>>> > support of RO private mappings, and if so, if the forced COW approach is
>>>>>> > appropriate or not.
>>>>>> >
>>>>>>
>>>>>> I disagree.
>>>>>>
>>>>>> FUTEX_WAIT has side-effects.  Specifically, it eats one wakeup sent by
>>>>>> FUTEX_WAKE.  So if something uses futexes on a file mapping, then a
>>>>>> process with only read access could (if the semantics were changed) DoS
>>>>>> the other processes by spawning a bunch of threads and FUTEX_WAITing
>>>>>> from each of them.
>>>>>>
>>>>>> If there were a FUTEX_WAIT_NOCONSUME that did not consume a wakeup and
>>>>>> worked on RO mappings, I would drop my objection.
>>>>>
>>>>> If a group of cooperating processes uses a memory segment to exchange
>>>>> critical information, do you really think this memory segment will be
>>>>> readable by other unrelated processes on the machine ?
>>>>
>>>> Depends on the design.
>>>>
>>>> I have some software I'm working on that uses shared files and could
>>>> easily use futexes.
>>>>
>>> I have software which currently uses shared files for a one way
>>> transfer of information, which is modeled precisely by the futex (as
>>> contrasted to the mutex) model. In this case, the number of receivers
>>> is undetermined, so the number of wakeups is set to maxint.
>>>
>>> The receivers are minimally trusted: they have read access to the
>>> files, so they cannot accidentally affect other processes use of the
>>> data. Requiring my files to be writeable by all clients would require
>>> a serious increase in the amount of software needing to be trusted.
>>
>> What's wrong with adding a FUTEX_WAIT_NOCONSUME flag then?  Your
>> program can use it to get exactly the semantics it wants and my
>> program can use it or not depending on which semantics it wants.
>>
> 1. I would prefer not to require my programs have to check for kernel
> version (code named "working", "regressed", and "altered") to decide
> which parameters need to be sent to the futex call.

You don't have to check for kernel version. Just try
FUTEX_WAIT_NOCONSUME first and retry with FUTEX_WAIT if it returns
-EINVAL.

I think you've already lost on regressed kernels regardless :-/

> 2. Doing FUTEX_WAIT_NOCONSUME would change the semantics of
> futex_wake() between the "working" and "altered" kernels, as it would
> no longer return the number of processes woken.

True, but that change couldn't affect old code because old code
wouldn't use FUTEX_WAIT_NOCONSUME.

>
> It seems that FUTEX_WAIT_NOCONSUME would be rather like a
> non-consuming read on a pipe.

More like a nonconsuming read on an eventfd, which sounds very useful.
(Actually, I'm porting code from Windows to Linux right now that
wants that feature...)

The reason I bring this up now is that I've been annoyed that
FUTEX_WAIT can be used on an R/O mapping to interfere with futexes in
that mapping. Under the original semantics this would have been
pretty much impossible to fix, but the regression has been there for
long enough that we have the option right now to fix it better instead
of restoring the original behavior.


--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/