Re: [PATCH v3 1/1] psi: stop relying on timer_pending for poll_work rescheduling
From: Johannes Weiner
Date: Wed Jul 07 2021 - 09:39:17 EST
This looks good to me now code wise. Just a comment on the comments:
On Tue, Jul 06, 2021 at 07:39:33PM -0700, Suren Baghdasaryan wrote:
> @@ -559,18 +560,14 @@ static u64 update_triggers(struct psi_group *group, u64 now)
> return now + group->poll_min_period;
> }
>
> -/* Schedule polling if it's not already scheduled. */
> -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay)
> +/* Schedule polling if it's not already scheduled or forced. */
> +static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
> + bool force)
> {
> struct task_struct *task;
>
> - /*
> - * Do not reschedule if already scheduled.
> - * Possible race with a timer scheduled after this check but before
> - * mod_timer below can be tolerated because group->polling_next_update
> - * will keep updates on schedule.
> - */
> - if (timer_pending(&group->poll_timer))
> + /* xchg should be called even when !force to set poll_scheduled */
> + if (atomic_xchg(&group->poll_scheduled, 1) && !force)
> return;
This explains what the code does, but not why. It would be good to
explain the ordering with poll_work, here or there. But both sides
should mention each other.
> @@ -595,6 +595,28 @@ static void psi_poll_work(struct psi_group *group)
>
> now = sched_clock();
>
> + if (now > group->polling_until) {
> + /*
> + * We are either about to start or might stop polling if no
> + * state change was recorded. Resetting poll_scheduled leaves
> + * a small window for psi_group_change to sneak in and schedule
> + * an immegiate poll_work before we get to rescheduling. One
> + * potential extra wakeup at the end of the polling window
> + * should be negligible and polling_next_update still keeps
> + * updates correctly on schedule.
> + */
> + atomic_set(&group->poll_scheduled, 0);
> + /*
> + * Ensure that operations of clearing group->poll_scheduled and
> + * obtaining changed_states are not reordered.
> + */
> + smp_mb();
Same here, it would be good to explain that this is ordering the
scheduler with the timer such that no events are missed. Feel free to
reuse my race diagram from the other thread - those are better at
conveying the situation than freeform text.
Thanks