Re: [PATCH] thermal/intel_powerclamp: convert to smpboot thread

From: Jacob Pan
Date: Mon Apr 11 2016 - 11:20:50 EST


On Mon, 11 Apr 2016 14:47:12 +0200
Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote:

> On 04/06/2016 06:47 PM, Jacob Pan wrote:
> > On Fri, 11 Mar 2016 16:38:21 +0100
> > Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote:
> > Apologize for the delay, I missed it until Rui reminded me.
> >> Oh boy oh boy. This thing runs at SCHED_FIFO MAX_USER_RT_PRIO/2 and
> >> stops at mwait_idle_with_hints(). Why bother with /2?
> >> There are a few things I haven't fully decoded. For instance why
> >> is it looking at local_softirq_pending()?
> > the idea is to instrument some kind of quality of service. Idle
> > injection is a best effort based mechanism. So we don't want to
> > affect high priority realtime tasks nor softirq.
>
> But you disturb RT tasks with prio 1. Also I am not sure if you see
> the sofirq bits. The softirq runs before any (RT) task so the
> `pending ' should be 0. Unless the work is delegated to ksoftirqd.
>
I agree softirq runs before RT but there could be a gap between
raise_softirq() (set pending) and run softirq(). So it is possible
(thus unlikely()) our RT task runs after pending bits set but before
softirq runs. correct?
> >> The timer is probably here if mwait would let it sleep too long.
> >>
> > not sure i understand. could you explain?
>
> The timer invokes noop_timer() which does nothing so the only thing
> you want is the interrupt. Your mwait_idle_with_hints() could let you
> sleep say for one second. But the timer ensures that you wake up no
> later than 100us.
>
yeah, that is the idea to make sure we don't oversleep. You mean we can
optimize this but avoid extra wake ups? e.g. cancel timer if we already
sleep enough time?
> >> I tried to convert it over to smpboot thread so we don't have that
> >> CPU notifier stuff to fire the cpu threads during hotplug events.
> >>
> > there is another patchset to convert it to kthread worker. any
> > advantage of smpboot thread?
> > http://comments.gmane.org/gmane.linux.kernel.mm/144964
>
> It partly does the same thing except you still have your hotplug
> notifier which I wanted to get rid off. However it looks better than
> before.
> If you do prefer the kworker thingy then please switch from CPU_DEAD
> to CPU_DOWN_PREPARE (and add CPU_DOWN_FAILED to CPU_ONLINE).
> With those changes I should have no further problem with it :)
> Any ETA for (either of those patches) upstream?
>
+Petr
I do prefer not to keep track of CPU hotplug events. Let me do some
testing.
> >> The code / logic itself could be a little more inteligent and only
> >> wake up the threads for the CPUs that are about to idle but it
> >> seems it is done on all of the at once unless I missed something.
> >>
> > you mean only force idle when there is no natural idle? i thought
> > about that but hard to coordinate and enforce the duration of each
> > idle period. Any specific pointers?
>
> Implement it as an idle driver. So it would be invoked once the system
> goes idle as an alternative to (the default) mwait. However the fact
> that you (seem to) go idle even if there is work to do seems that you
> aim a different goal than idle if there is nothing left.
>
Right, we use the same idle inner loop as the idle task but powerclamp
driver aims at aligned, forced, and controlled idle time to manage
power/thermal envelop.
I also had an attempt to do this in CFS sched class.
https://lkml.org/lkml/2015/11/2/756
As it was suggested to be able to stop scheduler tick during force idle
time (which cannot do in the current powerclamp code).

Jacob