Re: [RFC PATCH] membarrier: expedited private command

From: Mathieu Desnoyers
Date: Thu Jul 27 2017 - 16:29:44 EST


----- On Jul 27, 2017, at 3:55 PM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote:

> On Thu, Jul 27, 2017 at 02:59:43PM -0400, Mathieu Desnoyers wrote:
>> diff --git a/kernel/membarrier.c b/kernel/membarrier.c
>> index 9f9284f37f8d..8c6c0f96f617 100644
>> --- a/kernel/membarrier.c
>> +++ b/kernel/membarrier.c
>> @@ -19,10 +19,81 @@
>> #include <linux/tick.h>
>>
>> /*
>> + * XXX For cpu_rq(). Should we rather move
>> + * membarrier_private_expedited() to sched/core.c or create
>> + * sched/membarrier.c ?
>
> The later perhaps.

Allright, will do that in v2.

>
>> +static void membarrier_private_expedited(void)
>> +{
>> + int cpu, this_cpu;
>> + cpumask_var_t tmpmask;
>> +
>> + if (num_online_cpus() == 1)
>> + return;
>> +
>> + /*
>> + * Matches memory barriers around rq->curr modification in
>> + * scheduler.
>> + */
>> + smp_mb(); /* system call entry is not a mb. */
>> +
>> + if (!alloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
>
> Why GFP_NOWAIT ? and falback. There seems to be a desire to make this a
> nonblocking syscall. Should we document this somewhere?

blocking a synchronization system call due to memory allocation
pressure seemed like an unwanted effect back in 2010, so I kept
the same approach. Perhaps we could state that all the "expedited"
commands should be non-blocking ?

>
>> + /* Fallback for OOM. */
>> + membarrier_private_expedited_ipi_each();
>> + goto end;
>> + }
>> +
>> + this_cpu = raw_smp_processor_id();
>
> This is a tad dodgy, you might want to put in a comment on how migrating
> this thread is ok.

How about this ?

/*
* Skipping the current CPU is OK even through we can be
* migrated at any point. The current CPU, at the point where we
* read raw_smp_processor_id(), is ensured to be in program
* order with respect to the caller thread. Therefore, we can
* skip this CPU from the iteration.
*/

>
>> + for_each_online_cpu(cpu) {
>
> One would also need cpus_read_lock() if you rely on the online mask.

OK.

>
>> + struct task_struct *p;
>> +
>> + if (cpu == this_cpu)
>> + continue;
>> + rcu_read_lock();
>> + p = task_rcu_dereference(&cpu_rq(cpu)->curr);
>> + if (p && p->mm == current->mm)
>> + __cpumask_set_cpu(cpu, tmpmask);
>> + rcu_read_unlock();
>> + }
>> + smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
>> + free_cpumask_var(tmpmask);
>> +end:
>> + /*
>> + * Memory barrier on the caller thread _after_ we finished
>> + * waiting for the last IPI. Matches memory barriers around
>> + * rq->curr modification in scheduler.
>> + */
>> + smp_mb(); /* exit from system call is not a mb */
>> +}
>
>> @@ -2737,6 +2757,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
>>
>> mm = next->mm;
>> oldmm = prev->active_mm;
>> + membarrier_expedited_mb_after_set_current(mm, oldmm);
>> /*
>> * For paravirt, this is coupled with an exit in switch_to to
>> * combine the page table reload and the switch backend into
>
> As said on IRC, we have finish_task_switch()->if (mm)
> mmdrop(mm)->atomic_dec_and_test() providing a smp_mb(). We just need to
> deal with the !mm case.

Yes, I have a v2 brewing that includes this change :)

Thanks!

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com