Re: [PATCH] sched/core: fix illegal RCU from offline CPUs

From: Qian Cai
Date: Mon Jan 13 2020 - 01:30:34 EST




> On Jan 12, 2020, at 7:33 PM, Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On 2020/01/13 1:17, Qian Cai wrote:
>> In the CPU-offline process, it calls mmdrop() after idle entry and the
>> subsequent call to cpuhp_report_idle_dead(). Once execution passes the
>> call to rcu_report_dead(), RCU is ignoring the CPU, which results in
>> lockdep complaints when mmdrop() uses RCU from either memcg or
>> debugobjects. Fix it by scheduling mmdrop() on another online CPU.
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 90e4b00ace89..41fb49f3dfce 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -6194,7 +6194,8 @@ void idle_task_exit(void)
>> current->active_mm = &init_mm;
>> finish_arch_post_lock_switch();
>> }
>> - mmdrop(mm);
>> + smp_call_function_single(cpumask_first(cpu_online_mask),
>> + (void (*)(void *))mmdrop, mm, 0);
>
> mmdrop() might sleep, but

If that is the case, and then the commit e78a7614f387 (âidle: Prevent
late-arriving interrupts from disrupting offlineâ) is incorrect because it
will disable local irq before calling mmdrop() which will trigger
the might_sleep() warning. Can you prove it?

>
> /*
> * smp_call_function_single - Run a function on a specific CPU
> * @func: The function to run. This must be fast and non-blocking.
> * @info: An arbitrary pointer to pass to the function.
> * @wait: If true, wait until function has completed on other CPUs.
> *
> * Returns 0 on success, else a negative status code.
> */
>
> . Maybe mmdrop_async() instead?