Re: [RFC PATCH] kernel/sched/core: busy wait before going idle

From: Pavan Kondeti
Date: Mon Apr 23 2018 - 06:17:57 EST


Hi Nick,

On Sun, Apr 15, 2018 at 11:31:49PM +1000, Nicholas Piggin wrote:
> This is a quick hack for comments, but I've always wondered --
> if we have a short term polling idle states in cpuidle for performance
> -- why not skip the context switch and entry into all the idle states,
> and just wait for a bit to see if something wakes up again.
>
> It's not uncommon to see various going-to-idle work in kernel profiles.
> This might be a way to reduce that (and just the cost of switching
> registers and kernel stack to idle thread). This can be an important
> path for single thread request-response throughput.
>
> tbench bandwidth seems to be improved (the numbers aren't too stable
> but they pretty consistently show some gain). 10-20% would be a pretty
> nice gain for such workloads
>
> clients 1 2 4 8 16 128
> vanilla 232 467 823 1819 3218 9065
> patched 310 503 962 2465 3743 9820
>

<snip>

> +idle_spin_end:
> /* Promote REQ to ACT */
> rq->clock_update_flags <<= 1;
> update_rq_clock(rq);
> @@ -3437,6 +3439,32 @@ static void __sched notrace __schedule(bool preempt)
> if (unlikely(signal_pending_state(prev->state, prev))) {
> prev->state = TASK_RUNNING;
> } else {
> + /*
> + * Busy wait before switching to idle thread. This
> + * is marked unlikely because we're idle so jumping
> + * out of line doesn't matter too much.
> + */
> + if (unlikely(do_idle_spin && rq->nr_running == 1)) {
> + u64 start;
> +
> + do_idle_spin = false;
> +
> + rq->clock_update_flags &= ~(RQCF_ACT_SKIP|RQCF_REQ_SKIP);
> + rq_unlock_irq(rq, &rf);
> +
> + spin_begin();
> + start = local_clock();
> + while (!need_resched() && prev->state &&
> + !signal_pending_state(prev->state, prev)) {
> + spin_cpu_relax();
> + if (local_clock() - start > 1000000)
> + break;
> + }

Couple of comments/questions.

When a RT task is doing this busy loop,

(1) need_resched() may not be set even if a fair/normal task is enqueued on
this CPU.

(2) Any lower prio RT task waking up on this CPU may migrate to another CPU
thinking this CPU is busy with higher prio RT task.

> + spin_end();
> +
> + rq_lock_irq(rq, &rf);
> + goto idle_spin_end;
> + }
> deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
> prev->on_rq = 0;
>
> --
> 2.17.0
>

Thanks,
Pavan

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.