On Sat, 15 Jun 2024 at 03:28, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
On Fri, Jun 14, 2024 at 12:48:37PM +0200, Vincent Guittot wrote:
On Fri, 14 Jun 2024 at 11:28, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
Vincent [5] pointed out a case where the idle load kick will fail to
run on an idle CPU since the IPI handler launching the ILB will check
for need_resched(). In such cases, the idle CPU relies on
newidle_balance() to pull tasks towards itself.
Is this the need_resched() in _nohz_idle_balance() ? Should we change
this to 'need_resched() && (rq->nr_running || rq->ttwu_pending)' or
something long those lines?
It's not only this but also in do_idle() as well which exits the loop
to look for tasks to schedule
Is that really a problem? Reading the initial email the problem seems to
be newidle balance, not hitting schedule. Schedule should be fairly
quick if there's nothing to do, no?
There are 2 problems:
- Because of NEED_RESCHED being set, we go through the full schedule
path for no reason and we finally do a sched_balance_newidle()
- Because of need_resched being set o wake up the cpu, we will not
kick the softirq to run the nohz idle load balance and get a chance to
pull a task on an idle CPU
I mean, it's fairly trivial to figure out if there really is going to be
work there.
Using an alternate flag instead of NEED_RESCHED to indicate a pending
IPI was suggested as the correct approach to solve this problem on the
same thread.
So adding per-arch changes for this seems like something we shouldn't
unless there really is no other sane options.
That is, I really think we should start with something like the below
and then fix any fallout from that.
The main problem is that need_resched becomes somewhat meaningless
because it doesn't only mean "I need to resched a task" and we have
to add more tests around even for those not using polling
True, however we already had some of that by having the wakeup list,
that made nr_running less 'reliable'.
The thing is, most architectures seem to have the TIF_POLLING_NRFLAG
bit, even if their main idle routine isn't actually using it, much of
Yes, I'm surprised that Arm arch has the TIF_POLLING_NRFLAG whereas it
has never been supported by the arch
the idle loop until it hits the arch idle will be having it set and will
thus tickle these cases *sometimes*.
[..snip..]