Re: [PATCH] workqueue: fix selection of wake_cpu in kick_pool()

From: Tejun Heo
Date: Fri Apr 19 2024 - 11:41:02 EST


Hello, Sven.

On Fri, Apr 19, 2024 at 10:27:05AM +0200, Sven Schnelle wrote:
> > Probably by wrapping determining the wake_cpu and the wake_up inside
> > cpu_read_lock() section.
>
> Do you mean rcu_read_lock()? cpus_read_lock() takes a mutex, and the
> crash happens in softirq context - so cpus_read_lock() can't be the
> correct lock.

I meant cpus_read_lock() but yeah we can't use that here.

> If i read the code correctly, cpu hotplug uses stop_machine_cpuslocked()
> - so rcu_read_lock() should be sufficient for non-atomic context.
>
> Looking at the backtrace the crash is actually happening in
> arch_vpu_is_preempted(). I don't know the semantics of that function,
> whether it is ok to call it for offline CPUs, or whether the calling
> code should make sure that the cpu is online (which would be my guess).
>
> Following the backtrace from my initial mail, I can't find a place where
> a check is done whether p->wake_cpu is actually online. Eventually
> available_idle_cpu() is calling vcpu_is_preempted(). I wonder whether
> available_idle_cpu() should do a cpu_online() check right at the
> beginning?

Yeah, adding a cpu_online() test there makes more sense to me.

> Adding Peter to CC, he probably knows.

Peter?

Thanks.

--
tejun