Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling

From: Suren Baghdasaryan
Date: Thu Jul 08 2021 - 16:37:56 EST


On Thu, Jul 8, 2021 at 12:55 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> On Thu, Jul 8, 2021 at 11:38 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> >
> > On Thu, Jul 08, 2021 at 08:54:56AM -0700, Suren Baghdasaryan wrote:
> > > On Thu, Jul 8, 2021 at 7:44 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > > > On Wed, Jul 07, 2021 at 03:43:48PM -0700, Suren Baghdasaryan wrote:
> > > > > On Wed, Jul 7, 2021 at 6:39 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > > > > > This looks good to me now code wise. Just a comment on the comments:
> > > > > >
> > > > > > On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> > > > > > > @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> > > > > > > return now + group->poll_min_period;
> > > > > > > }
> > > > > > >
> > > > > > > -/* Schedule polling if it's not already scheduled. */
> > > > > > > -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> > > > > > > +/* Schedule polling if it's not already scheduled or forced. */
> > > > > > > +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> > > > > > > + bool force)
> > > > > > > {
> > > > > > > struct task_struct *task;
> > > > > > >
> > > > > > > - /*
> > > > > > > - * Do not reschedule if already scheduled.
> > > > > > > - * Possible race with a timer scheduled after this check but before
> > > > > > > - * mod_timer below can be tolerated because group->polling_next_update
> > > > > > > - * will keep updates on schedule.
> > > > > > > - */
> > > > > > > - if (timer_pending(&group->poll_timer))
> > > > > > > + /* xchg should be called even when !force to set poll_scheduled */
> > > > > > > + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> > > > > > > return;
> > > > > >
> > > > > > This explains what the code does, but not why. It would be good to
> > > > > > explain the ordering with poll_work, here or there. But both sides
> > > > > > should mention each other.
> > > > >
> > > > > How about this:
> > > > >
> > > > > /*
> > > > > * atomic_xchg should be called even when !force to always set poll_scheduled
> > > > > * and to provide a memory barrier (see the comment inside psi_poll_work).
> > > > > */
> > > >
> > > > The memory barrier part makes sense, but the first part says what the
> > > > code does and the message is unclear to me. Are you worried somebody
> > > > might turn this around in the future and only conditionalize on
> > > > poll_scheduled when !force? Essentially, I don't see the downside of
> > > > dropping that. But maybe I'm missing something.
> > >
> > > Actually you are right. Originally I was worried that there might be a
> > > case when poll_scheduled==0 and force==true and if someone flips the
> > > conditions we will reschedule the timer but will not set
> > > poll_scheduled back to 1.
> >
> > Oh I see.
> >
> > Right, flipping the condition doesn't make sense because we need
> > poll_scheduled to be set when we go ahead - whether we're forcing or
> > not. I.e. if we were in a locked section, we'd write it like this:
> >
> > if (poll_scheduled)
> > if (!force)
> > return;
> > else
> > poll_scheduled = 1;
> >
> > > However I don't think this condition is possible. We set force=true
> > > only when we skipped resetting poll_schedule to 0 and on initial
> > > wakeup we always reset poll_schedule. How about changing the comment
> > > to this:
> > >
> > > /*
> > > * atomic_xchg should be called even when !force to provide a
> > > * full memory barrier (see the comment inside psi_poll_work).
> > > */
> >
> > Personally, I still find this more confusing than no comment on
> > !force, because when you read it it sort of raises the question what
> > the alternatives would be. And the alternatives appear to be
> > nonsensical code rather than legitimate options.
> >
> > But I won't insist if you prefer to leave it in. Your call.
>
> I would like to keep it as a precaution, if you don't mind. In case
> someone in the future thinks about "optimizing" this by flipping the
> condition, hopefully the comment will give them a pause to think about
> it :)
>
> >
> > > > /*
> > > > * A task change can race with the poll worker that is supposed to
> > > > * report on it. To avoid missing events, ensure ordering between
> > > > * poll_scheduled and the task state accesses, such that if the poll
> > > > * worker misses the state update, the task change is guaranteed to
> > > > * reschedule the poll worker:
> > > > *
> > > > * poll worker:
> > > > * atomic_set(poll_scheduled, 0)
> > > > * smp_mb()
> > > > * LOAD states
> > > > *
> > > > * task change:
> > > > * STORE states
> > > > * if atomic_xchg(poll_scheduled, 1) == 0:
> > > > * schedule poll worker
> > > > *
> > > > * The atomic_xchg() implies a full barrier.
> > > > */
> > > > smp_mb();
> > > >
> > > > This gives a high-level view of what's happening but it can still be
> > > > mapped to the code by following the poll_scheduled variable.
> > >
> > > This looks really good to me.
> > > If you agree on the first comment modification, should I respin the
> > > next version?
> >
> > Yeah, sounds good to me!
>
> Thanks! I'll post an update shortly.

v4 is posted at https://lore.kernel.org/patchwork/patch/1455172/