Re: EEVDF and NUMA balancing

From: Julia Lawall
Date: Fri Jan 05 2024 - 12:27:37 EST




On Fri, 5 Jan 2024, Julia Lawall wrote:

>
>
> On Fri, 5 Jan 2024, Vincent Guittot wrote:
>
> > On Fri, 5 Jan 2024 at 15:51, Julia Lawall <julia.lawall@xxxxxxxx> wrote:
> > >
> > > > Your system is calling the polling mode and not the default
> > > > cpuidle_idle_call() ? This could explain why I don't see such problem
> > > > on my system which doesn't have polling
> > > >
> > > > Are you forcing the use of polling mode ?
> > > > If yes, could you check that this problem disappears without forcing
> > > > polling mode ?
> > >
> > > I expanded the code in do_idle to:
> > >
> > > if (cpu_idle_force_poll) { c1++;
> > > tick_nohz_idle_restart_tick();
> > > cpu_idle_poll();
> > > } else if (tick_check_broadcast_expired()) { c2++;
> > > tick_nohz_idle_restart_tick();
> > > cpu_idle_poll();
> > > } else { c3++;
> > > cpuidle_idle_call();
> > > }
> > >
> > > Later, I have:
> > >
> > > trace_printk("force poll: %d: c1: %d, c2: %d, c3: %d\n",cpu_idle_force_poll, c1, c2, c3);
> > > flush_smp_call_function_queue();
> > > schedule_idle();
> > >
> > > force poll, c1 and c2 are always 0, and c3 is always some non-zero value.
> > > Sometimes small (often 1), and sometimes large (304 or 305).
> > >
> > > So I don't think it's calling cpu_idle_poll().
> >
> > I agree that something else
> >
> > >
> > > x86 has TIF_POLLING_NRFLAG defined to be a non zero value, which I think
> > > is sufficient to cause the issue.
> >
> > Could you trace trace_sched_wake_idle_without_ipi() ans csd traces as well ?
> > I don't understand what set need_resched() in your case; having in
> > mind that I don't see the problem on my Arm systems and IIRC Peter
> > said that he didn't face the problem on his x86 system.
>
> TIF_POLLING_NRFLAG doesn't seem to be defined on Arm.
>
> Peter said that he didn't see the problem, but perhaps that was just
> random. It requires a NUMA move to occur. I make 20 runs to be sure to
> see the problem at least once. But another machine might behave
> differently.
>
> I believe the call chain is:
>
> scheduler_tick
> trigger_load_balance
> nohz_balancer_kick
> kick_ilb
> smp_call_function_single_async
> generic_exec_single
> __smp_call_single_queue
> send_call_function_single_ipi
> call_function_single_prep_ipi
> set_nr_if_polling <====== sets need_resched
>
> I'll make a trace to reverify that.

This is what I see at a tick, which corresponds to the call chain shown
above:

bt.B.x-4184 [046] 466.410605: bputs: scheduler_tick: calling trigger_load_balance
bt.B.x-4184 [046] 466.410605: bputs: trigger_load_balance: calling nohz_balancer_kick
bt.B.x-4184 [046] 466.410605: bputs: trigger_load_balance: calling kick_ilb
bt.B.x-4184 [046] 466.410607: bprint: trigger_load_balance: calling smp_call_function_single_async 22
bt.B.x-4184 [046] 466.410607: bputs: smp_call_function_single_async: calling generic_exec_single
bt.B.x-4184 [046] 466.410607: bputs: generic_exec_single: calling __smp_call_single_queue
bt.B.x-4184 [046] 466.410608: bputs: __smp_call_single_queue: calling send_call_function_single_ipi
bt.B.x-4184 [046] 466.410608: bputs: __smp_call_single_queue: calling call_function_single_prep_ipi
bt.B.x-4184 [046] 466.410608: bputs: call_function_single_prep_ipi: calling set_nr_if_polling
bt.B.x-4184 [046] 466.410609: sched_wake_idle_without_ipi: cpu=22

julia