Re: [PATCH] hotplug: Optimize {get,put}_online_cpus()

From: Peter Zijlstra
Date: Tue Sep 17 2013 - 12:45:36 EST


On Tue, Sep 17, 2013 at 05:20:50PM +0100, Mel Gorman wrote:
> > +extern struct task_struct *__cpuhp_writer;
> > +DECLARE_PER_CPU(unsigned int, __cpuhp_refcount);
> > +
> > +extern void __get_online_cpus(void);
> > +
> > +static inline void get_online_cpus(void)
> > +{
> > + might_sleep();
> > +
> > + this_cpu_inc(__cpuhp_refcount);
> > + /*
> > + * Order the refcount inc against the writer read; pairs with the full
> > + * barrier in cpu_hotplug_begin().
> > + */
> > + smp_mb();
> > + if (unlikely(__cpuhp_writer))
> > + __get_online_cpus();
> > +}
> > +
>
> If the problem with get_online_cpus() is the shared global state then a
> full barrier in the fast path is still going to hurt. Granted, it will hurt
> a lot less and there should be no lock contention.

I went for a lot less, I wasn't smart enough to get rid of it. Also,
since its a lock op we should at least provide an ACQUIRE barrier.

> However, what barrier in cpu_hotplug_begin is the comment referring to?

set_current_state() implies a full barrier and nicely separates the
write to __cpuhp_writer and the read of __cpuph_refcount.

> The
> other barrier is in the slowpath __get_online_cpus. Did you mean to do
> a rmb here and a wmb after __cpuhp_writer is set in cpu_hotplug_begin?

No, since we're ordering LOADs and STORES (see below) we must use full
barriers.

> I'm assuming you are currently using a full barrier to guarantee that an
> update if cpuhp_writer will be visible so get_online_cpus blocks but I'm
> not 100% sure because of the comments.

I'm ordering:

CPU0 -- get_online_cpus() CPU1 -- cpu_hotplug_begin()

STORE __cpuhp_refcount STORE __cpuhp_writer

MB MB

LOAD __cpuhp_writer LOAD __cpuhp_refcount

Such that neither can miss the state of the other and we get proper
mutual exclusion.

> > +extern void __put_online_cpus(void);
> > +
> > +static inline void put_online_cpus(void)
> > +{
> > + barrier();
>
> Why is this barrier necessary?

To ensure the compiler keeps all loads/stores done before the
read-unlock before it.

Arguably it should be a complete RELEASE barrier. I should've put an XXX
comment here but the brain gave out completely for the day.

> I could not find anything that stated if an
> inline function is an implicit compiler barrier but whether it is or not,
> it's not clear why it's necessary at all.

It is not, only actual function calls are an implied sync point for the
compiler.

> > + this_cpu_dec(__cpuhp_refcount);
> > + if (unlikely(__cpuhp_writer))
> > + __put_online_cpus();
> > +}
> > +

> > +struct task_struct *__cpuhp_writer = NULL;
> > +EXPORT_SYMBOL_GPL(__cpuhp_writer);
> > +
> > +DEFINE_PER_CPU(unsigned int, __cpuhp_refcount);
> > +EXPORT_PER_CPU_SYMBOL_GPL(__cpuhp_refcount);
> >
> > +static DECLARE_WAIT_QUEUE_HEAD(cpuhp_wq);
> > +
> > +void __get_online_cpus(void)
> > {
> > + if (__cpuhp_writer == current)
> > return;
> >
> > +again:
> > + /*
> > + * Ensure a pending reading has a 0 refcount.
> > + *
> > + * Without this a new reader that comes in before cpu_hotplug_begin()
> > + * reads the refcount will deadlock.
> > + */
> > + this_cpu_dec(__cpuhp_refcount);
> > + wait_event(cpuhp_wq, !__cpuhp_writer);
> > +
> > + this_cpu_inc(__cpuhp_refcount);
> > + /*
> > + * See get_online_cpu().
> > + */
> > + smp_mb();
> > + if (unlikely(__cpuhp_writer))
> > + goto again;
> > }
>
> If CPU hotplug operations are very frequent (or a stupid stress test) then
> it's possible for a new hotplug operation to start (updating __cpuhp_writer)
> before a caller to __get_online_cpus can update the refcount. Potentially
> a caller to __get_online_cpus gets starved although as it only affects a
> CPU hotplug stress test it may not be a serious issue.

Right.. If that ever becomes a problem we should fix it, but aside from
stress tests hotplug should be extremely rare.

Initially I kept the reference over the wait_event() but realized (as
per the comment) that that would deadlock cpu_hotplug_begin() for it
would never observe !refcount.

One solution for this problem is having refcount as an array of 2 and
flipping the index at the appropriate times.

> > +EXPORT_SYMBOL_GPL(__get_online_cpus);
> >
> > +void __put_online_cpus(void)
> > {
> > + unsigned int refcnt = 0;
> > + int cpu;
> >
> > + if (__cpuhp_writer == current)
> > + return;
> >
> > + for_each_possible_cpu(cpu)
> > + refcnt += per_cpu(__cpuhp_refcount, cpu);
> >
>
> This can result in spurious wakeups if CPU N calls get_online_cpus after
> its refcnt has been checked but I could not think of a case where it
> matters.

Right and right.. too many wakeups aren't a correctness issue. One
should try and minimize them for performance reasons though :-)

> > + if (!refcnt)
> > + wake_up_process(__cpuhp_writer);
> > }


> > /*
> > * This ensures that the hotplug operation can begin only when the
> > * refcount goes to zero.
> > *
> > * Since cpu_hotplug_begin() is always called after invoking
> > * cpu_maps_update_begin(), we can be sure that only one writer is active.
> > */
> > void cpu_hotplug_begin(void)
> > {
> > + __cpuhp_writer = current;
> >
> > for (;;) {
> > + unsigned int refcnt = 0;
> > + int cpu;
> > +
> > + /*
> > + * Order the setting of writer against the reading of refcount;
> > + * pairs with the full barrier in get_online_cpus().
> > + */
> > +
> > + set_current_state(TASK_UNINTERRUPTIBLE);
> > +
> > + for_each_possible_cpu(cpu)
> > + refcnt += per_cpu(__cpuhp_refcount, cpu);
> > +
>
> CPU 0 CPU 1
> get_online_cpus
> refcnt++
> __cpuhp_writer = current
> refcnt > 0
> schedule
> __get_online_cpus slowpath
> refcnt--
> wait_event(!__cpuhp_writer)
>
> What wakes up __cpuhp_writer to recheck the refcnts and see that they're
> all 0?

The wakeup in __put_online_cpus() you just commented on?
put_online_cpus() will drop into the slow path __put_online_cpus() if
there's a writer and compute the refcount and perform the wakeup when
!refcount.

> > + if (!refcnt)
> > break;
> > +
> > schedule();
> > }
> > + __set_current_state(TASK_RUNNING);
> > }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/