Re: [PATCH 2/6] sched: add function to execute a function synchronously on a physical cpu

From: Juergen Gross
Date: Fri Mar 11 2016 - 07:48:23 EST


On 11/03/16 13:42, Peter Zijlstra wrote:
> On Fri, Mar 11, 2016 at 01:19:50PM +0100, Peter Zijlstra wrote:
>> On Fri, Mar 11, 2016 at 12:59:30PM +0100, Juergen Gross wrote:
>>> +int call_sync_on_phys_cpu(unsigned cpu, int (*func)(void *), void *par)
>>> +{
>>> + cpumask_var_t old_mask;
>>> + int ret;
>>> +
>>> + if (cpu >= nr_cpu_ids)
>>> + return -EINVAL;
>>> +
>>> + if (!alloc_cpumask_var(&old_mask, GFP_KERNEL))
>>> + return -ENOMEM;
>>> +
>>> + cpumask_copy(old_mask, &current->cpus_allowed);
>>> + ret = set_cpus_allowed_ptr(current, cpumask_of(cpu));
>>> + if (ret)
>>> + goto out;
>>
>> So what happens if someone does sched_setaffinity() right about here?
>>
>>> +
>>> + ret = func(par);
>>> +
>>> + set_cpus_allowed_ptr(current, old_mask);
>>> +
>>> +out:
>>> + free_cpumask_var(old_mask);
>>> + return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(call_sync_on_phys_cpu);
>>
>> This is disgusting, and you're adding this to !Xen kernels too.
>
> how about something like:
>
> struct xen_callback_struct {
> struct work_struct work;
> struct completion done;
> void * data;
> int ret;
> };
>
> static void xen_callback_f(struct work_struct *work)
> {
> struct xen_callback_struct *xcs = container_of(work, struct xen_callback_struct, work);
>
> xcs->ret = xcs->func(xcs->data);
>
> complete(&xcs->done);
> }
>
> xen_call_on_cpu_sync(int cpu, int (*func)(void *), void *data)
> {
> struct xen_callback_state xcs = {
> .work = __WORK_INITIALIZER(xcs.work, xen_callback_f);
> .done = COMPLETION_INITIALIZER_ONSTACK(xcs.done),
> .data = data,
> };
>
> queue_work_on(&work, cpu);
> wait_for_completion(&xcs.done);
>
> return xcs.ret;
> }
>
> No mucking about with the scheduler state, no new exported functions
> etc..
>

Hey, I like it. Can't be limited to Xen as on bare metal the function
needs to be called on cpu 0, too. But avoiding the scheduler fiddling
is much better! As this seems to be required for Dell hardware only,
I could add it to some Dell base driver in case you don't want to add
it to core code.


Juergen