Re: [RFC PATCH v7 1/7] Restartable sequences system call

From: Andy Lutomirski
Date: Thu Aug 04 2016 - 01:11:36 EST


On Wed, Aug 3, 2016 at 9:27 PM, Boqun Feng <boqun.feng@xxxxxxxxx> wrote:
> On Wed, Aug 03, 2016 at 09:37:57AM -0700, Andy Lutomirski wrote:
>> On Wed, Aug 3, 2016 at 5:27 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > On Tue, Jul 26, 2016 at 03:02:19AM +0000, Mathieu Desnoyers wrote:
>> >> We really care about preemption here. Every migration implies a
>> >> preemption from a user-space perspective. If we would only care
>> >> about keeping the CPU id up-to-date, hooking into migration would be
>> >> enough. But since we want atomicity guarantees for restartable
>> >> sequences, we need to hook into preemption.
>> >
>> >> It allows user-space to perform update operations on per-cpu data without
>> >> requiring heavy-weight atomic operations.
>> >
>> > Well, a CMPXCHG without LOCK prefix isn't all that expensive on x86.
>> >
>> > It is however on PPC and possibly other architectures, so in name of
>> > simplicity supporting only the one variant makes sense.
>> >
>>
>> I wouldn't want to depend on CMPXCHG. But imagine we had primitives
>> that were narrower than the full abort-on-preemption primitive.
>> Specifically, suppose we had abort if (actual cpu != expected_cpu ||
>> *aptr != aval). We could do things like:
>>
>> expected_cpu = cpu;
>> aval = NULL; // disarm for now
>> begin();
>> aval = event_count[cpu] + 1;
>> event_count[cpu] = aval;
>> event_count[cpu]++;
>
> This line is redundant, right? Because it will guarantee a failure even
> in no-contention cases.
>
>>
>> ... compute something ...
>>
>> // arm the rest of it
>> aptr = &event_count[cpu];
>> if (*aptr != aval)
>> goto fail;
>>
>> *thing_im_writing = value_i_computed;
>> end();
>>
>> The idea here is that we don't rely on the scheduler to increment the
>> event count at all, which means that we get to determine the scope of
>> what kinds of access conflicts we care about ourselves.
>>
>
> If we increase the event count in userspace, how could we prevent two
> userspace threads from racing on the event_count[cpu] field? For
> example:
>
> CPU 0
> ================
> {event_count[0] is initially 0}
>
> [Thread 1]
> begin();
> aval = event_count[cpu] + 1; // 1
>
> (preempted)
> [Thread 2]
> begin();
> aval = event_count[cpu] + 1; // 1, too
> event_count[cpu] = aval; // event_count[0] is 1
>

You're right :( This would work with an xadd instruction, but that's
very slow and doesn't exist on most architectures. It could also work
if we did:

aval = some_tls_value++;

where some_tls_value is set up such that no two threads could ever end
up with the same values (using high bits as thread ids, perhaps), but
that's messy. Maybe my idea is no good.