Re: [RESEND PATCH v4 1/1] psi: stop relying on timer_pending for poll_work rescheduling

From: Suren Baghdasaryan
Date: Mon Oct 24 2022 - 18:33:11 EST


On Mon, Oct 24, 2022 at 2:56 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Thu, Oct 20, 2022 at 03:25:47PM -0700, Suren Baghdasaryan wrote:
> > On Mon, Oct 10, 2022 at 3:57 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> > >
> > > Psi polling mechanism is trying to minimize the number of wakeups to
> > > run psi_poll_work and is currently relying on timer_pending() to detect
> > > when this work is already scheduled. This provides a window of opportunity
> > > for psi_group_change to schedule an immediate psi_poll_work after
> > > poll_timer_fn got called but before psi_poll_work could reschedule itself.
> > > Below is the depiction of this entire window:
> > >
> > > poll_timer_fn
> > > wake_up_interruptible(&group->poll_wait);
> > >
> > > psi_poll_worker
> > > wait_event_interruptible(group->poll_wait, ...)
> > > psi_poll_work
> > > psi_schedule_poll_work
> > > if (timer_pending(&group->poll_timer)) return;
> > > ...
> > > mod_timer(&group->poll_timer, jiffies + delay);
> > >
> > > Prior to 461daba06bdc we used to rely on poll_scheduled atomic which was
> > > reset and set back inside psi_poll_work and therefore this race window
> > > was much smaller.
> > > The larger window causes increased number of wakeups and our partners
> > > report visible power regression of ~10mA after applying 461daba06bdc.
> > > Bring back the poll_scheduled atomic and make this race window even
> > > narrower by resetting poll_scheduled only when we reach polling expiration
> > > time. This does not completely eliminate the possibility of extra wakeups
> > > caused by a race with psi_group_change however it will limit it to the
> > > worst case scenario of one extra wakeup per every tracking window (0.5s
> > > in the worst case).
> > > This patch also ensures correct ordering between clearing poll_scheduled
> > > flag and obtaining changed_states using memory barrier. Correct ordering
> > > between updating changed_states and setting poll_scheduled is ensured by
> > > atomic_xchg operation.
> > > By tracing the number of immediate rescheduling attempts performed by
> > > psi_group_change and the number of these attempts being blocked due to
> > > psi monitor being already active, we can assess the effects of this change:
> > >
> > > Before the patch:
> > > Run#1 Run#2 Run#3
> > > Immediate reschedules attempted: 684365 1385156 1261240
> > > Immediate reschedules blocked: 682846 1381654 1258682
> > > Immediate reschedules (delta): 1519 3502 2558
> > > Immediate reschedules (% of attempted): 0.22% 0.25% 0.20%
> > >
> > > After the patch:
> > > Run#1 Run#2 Run#3
> > > Immediate reschedules attempted: 882244 770298 426218
> > > Immediate reschedules blocked: 881996 769796 426074
> > > Immediate reschedules (delta): 248 502 144
> > > Immediate reschedules (% of attempted): 0.03% 0.07% 0.03%
> > >
> > > The number of non-blocked immediate reschedules dropped from 0.22-0.25%
> > > to 0.03-0.07%. The drop is attributed to the decrease in the race window
> > > size and the fact that we allow this race only when psi monitors reach
> > > polling window expiration time.
> > >
> > > Fixes: 461daba06bdc ("psi: eliminate kthread_worker from psi trigger scheduling mechanism")
> > > Reported-by: Kathleen Chang <yt.chang@xxxxxxxxxxxx>
> > > Reported-by: Wenju Xu <wenju.xu@xxxxxxxxxxxx>
> > > Reported-by: Jonathan Chen <jonathan.jmchen@xxxxxxxxxxxx>
> > > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> > > Tested-by: SH Chen <show-hong.chen@xxxxxxxxxxxx>
> > > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> > > ---
> > > This patch somehow slipped through the cracks after being acked by Johannes in
> > > [1] and I didn't notice it until now because we cherry-picked it into Android
> > > kernel trees due to the urgency at that time. On the bright side, this change
> > > has been tested for about a year in the field by millions of devices.
> > > Resending v4 of this patch previously posted at [2], rebased on the latest
> > > Linus' TOT.
> >
> > Hi Peter,
> > We missed this Ack'ed patch last year and as I described above I
> > didn't notice that up until now. With rc1 released, hopefully it's a
> > good time to ping you to ask for inclusion of this patch in your tree.
> > If the timing is not good, please let me know when to remind you and
> > I'll send another email. Just want to make sure it does not slip
> > again.
> >
> > Just FYI, we have two other Ack'ed PSI patches for you to consider:
> >
> > https://lore.kernel.org/all/20221014110551.22695-1-zhouchengming@xxxxxxxxxxxxx/
> > https://lore.kernel.org/all/20220919072356.GA29069@xxxxxxxxx/
>
> Thanks for the poke; I've picked up all three and will place then in
> sched/core.

Thanks!

>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@xxxxxxxxxxx.
>