Re: [Question] report a race condition between CPU hotplug state machine and hrtimer 'sched_cfs_period_timer' for cfs bandwidth throttling

From: Vincent Guittot
Date: Tue Aug 29 2023 - 03:19:17 EST


On Wed, 23 Aug 2023 at 12:14, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Thu, Jun 29 2023 at 10:30, Vincent Guittot wrote:
> > On Thu, 29 Jun 2023 at 00:01, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> int cpu_device_down(struct device *dev)
> >> {
> >> - return cpu_down(dev->id, CPUHP_OFFLINE);
> >> + unsigned int cpu = cpumask_any_but(cpu_online_mask, dev->id);
> >> +
> >> + if (cpu >= nr_cpu_ids)
> >> + return -EBUSY;
> >> +
> >> + return work_on_cpu(cpu, __cpu_device_down, dev);
> >
> > The comment for work_on_cpu :
> >
> > * It is up to the caller to ensure that the cpu doesn't go offline.
> > * The caller must not hold any locks which would prevent @fn from completing.
> >
> > make me wonder if this should be done only once the hotplug lock is
> > taken so the selected cpu will not go offline
>
> That makes sense. Updated and again untested patch below.
>
> Thanks,
>
> tglx
> ---
> Subject: cpu/hotplug: Prevent self deadlock on CPU hot-unplug
> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Date: Wed, 23 Aug 2023 10:47:02 +0200
>
> Xiongfeng reported and debugged a self deadlock of the task which initiates
> and controls a CPU hot-unplug operation vs. the CFS bandwidth timer.
>
> CPU1 CPU2
>
> T1 sets cfs_quota
> starts hrtimer cfs_bandwidth 'period_timer'
> T1 is migrated to CPU2
> T1 initiates offlining of CPU1
> Hotplug operation starts
> ...
> 'period_timer' expires and is re-enqueued on CPU1
> ...
> take_cpu_down()
> CPU1 shuts down and does not handle timers
> anymore. They have to be migrated in the
> post dead hotplug steps by the control task.
>
> T1 runs the post dead offline operation
> T1 is scheduled out
> T1 waits for 'period_timer' to expire
>
> T1 waits there forever if it is scheduled out before it can execute the hrtimer
> offline callback hrtimers_dead_cpu().
>
> Cure this by delegating the hotplug control operation to a worker thread on
> an online CPU. This takes the initiating user space task, which might be
> affected by the bandwidth timer, completely out of the picture.
>
> Reported-by: Xiongfeng Wang <wangxiongfeng2@xxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@xxxxxxxxxx

Acked-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>


> ---
> kernel/cpu.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1467,8 +1467,22 @@ static int __ref _cpu_down(unsigned int
> return ret;
> }
>
> +struct cpu_down_work {
> + unsigned int cpu;
> + enum cpuhp_state target;
> +};
> +
> +static long __cpu_down_maps_locked(void *arg)
> +{
> + struct cpu_down_work *work = arg;
> +
> + return _cpu_down(work->cpu, 0, work->target);
> +}
> +
> static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target)
> {
> + struct cpu_down_work work = { .cpu = cpu, .target = target, };
> +
> /*
> * If the platform does not support hotplug, report it explicitly to
> * differentiate it from a transient offlining failure.
> @@ -1477,7 +1491,15 @@ static int cpu_down_maps_locked(unsigned
> return -EOPNOTSUPP;
> if (cpu_hotplug_disabled)
> return -EBUSY;
> - return _cpu_down(cpu, 0, target);
> +
> + /*
> + * Ensure that the control task does not run on the to be offlined
> + * CPU to prevent a deadlock against cfs_b->period_timer.
> + */
> + cpu = cpumask_any_but(cpu_online_mask, cpu);
> + if (cpu >= nr_cpu_ids)
> + return -EBUSY;
> + return work_on_cpu(cpu, __cpu_down_maps_locked, &work);
> }
>
> static int cpu_down(unsigned int cpu, enum cpuhp_state target)