Re: [RFC PATCH v3 1/2] membarrier: Provide register expedited private command

From: Boqun Feng
Date: Thu Sep 21 2017 - 23:32:26 EST


On Fri, Sep 22, 2017 at 11:22:06AM +0800, Boqun Feng wrote:
> Hi Mathieu,
>
> On Tue, Sep 19, 2017 at 06:13:41PM -0400, Mathieu Desnoyers wrote:
> > Provide a new command allowing processes to register their intent to use
> > the private expedited command.
> >
> > This allows PowerPC to skip the full memory barrier in switch_mm(), and
> > only issue the barrier when scheduling into a task belonging to a
> > process that has registered to use expedited private.
> >
> > Processes are now required to register before using
> > MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
> >
>
> Sorry I'm late for the party, but I couldn't stop thinking whether we
> could avoid the register thing at all, because the registering makes
> sys_membarrier() more complex(both for the interface and the
> implementation). So how about we trade-off a little bit by taking
> some(not all) the rq->locks?
>
> The idea is in membarrier_private_expedited(), we go through all ->curr
> on each CPU and
>
> 1) If it's a userspace task and its ->mm is matched, we send an ipi
>
> 2) If it's a kernel task, we skip
>
> (Because there will be a smp_mb() implied by mmdrop(), when it
> switchs to userspace task).
>
> 3) If it's a userspace task and its ->mm is not matched, we take
> the corresponding rq->lock and check rq->curr again, if its ->mm
> matched, we send an ipi, otherwise we do nothing.
>
> (Because if we observe rq->curr is not matched with rq->lock
> held, when a task having matched ->mm schedules in, the rq->lock
> pairing along with the smp_mb__after_spinlock() will guarantee
> it observes all memory ops before sys_membarrir()).
>
> membarrier_private_expedited() will look like this if we choose this
> way:
>
> void membarrier_private_expedited()
> {
> int cpu;
> bool fallback = false;
> cpumask_var_t tmpmask;
> struct rq_flags rf;
>
>
> if (num_online_cpus() == 1)
> return;
>
> smp_mb();
>
> if (!zalloc_cpumask_var(&tmpmask, GFP_NOWAIT)) {
> /* Fallback for OOM. */
> fallback = true;
> }
>
> cpus_read_lock();
> for_each_online_cpu(cpu) {
> struct task_struct *p;
>
> if (cpu == raw_smp_processor_id())
> continue;
>
> rcu_read_lock();
> p = task_rcu_dereference(&cpu_rq(cpu)->curr);
>
> if (!p) {
> rcu_read_unlock();
> continue;
> }
>
> if (p->mm == current->mm) {
> if (!fallback)
> __cpumask_set_cpu(cpu, tmpmask);
> else
> smp_call_function_single(cpu, ipi_mb, NULL, 1);
> }
>
> if (p->mm == current->mm || !p->mm) {
> rcu_read_unlock();
> continue;
> }
>
> rcu_read_unlock();
>
> /*
> * This should be a arch-specific code, as we don't
> * need it at else place other than some archs without
> * a smp_mb() in switch_mm() (i.e. powerpc)
> */
> rq_lock_irq(cpu_rq(cpu), &rf);
> if (p->mm == current->mm) {

Oops, this one should be

if (cpu_curr(cpu)->mm == current->mm)

> if (!fallback)
> __cpumask_set_cpu(cpu, tmpmask);
> else
> smp_call_function_single(cpu, ipi_mb, NULL, 1);

, and this better be moved out of the lock rq->lock critical section.

Regards,
Boqun

> }
> rq_unlock_irq(cpu_rq(cpu), &rf);
> }
> if (!fallback) {
> smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
> free_cpumask_var(tmpmask);
> }
> cpus_read_unlock();
>
> smp_mb();
> }
>
> Thoughts?
>
> Regards,
> Boqun
>
[...]

Attachment: signature.asc
Description: PGP signature