Re: [PATCH v2] schedutil: Allow cpufreq requests to be made even when kthread kicked

From: Joel Fernandes
Date: Tue May 22 2018 - 17:16:05 EST

Next message: Alex Williamson: "Re: [PATCH 2/2] NTB: PCI Quirk to Enable Switchtec NT Functionality with IOMMU On"
Previous message: kbuild test robot: "Re: [PATCH] mm: fix race between kmem_cache destroy, create and deactivate"
In reply to: Viresh Kumar: "Re: [PATCH v2] schedutil: Allow cpufreq requests to be made even when kthread kicked"
Next in thread: Rafael J. Wysocki: "Re: [PATCH v2] schedutil: Allow cpufreq requests to be made even when kthread kicked"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, May 22, 2018 at 04:04:15PM +0530, Viresh Kumar wrote:
> Okay, me and Rafael were discussing this patch, locking and races around this.
>
> On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index e13df951aca7..5c482ec38610 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > !cpufreq_can_do_remote_dvfs(sg_policy->policy))
> > return false;
> >
> > - if (sg_policy->work_in_progress)
> > - return false;
> > -
> > if (unlikely(sg_policy->need_freq_update)) {
> > sg_policy->need_freq_update = false;
> > /*
> > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> >
> > policy->cur = next_freq;
> > trace_cpu_frequency(next_freq, smp_processor_id());
> > - } else {
> > + } else if (!sg_policy->work_in_progress) {
> > sg_policy->work_in_progress = true;
> > irq_work_queue(&sg_policy->irq_work);
> > }
> > @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >
> > ignore_dl_rate_limit(sg_cpu, sg_policy);
> >
> > + /*
> > + * For slow-switch systems, single policy requests can't run at the
> > + * moment if update is in progress, unless we acquire update_lock.
> > + */
> > + if (sg_policy->work_in_progress)
> > + return;
> > +
> > if (!sugov_should_update_freq(sg_policy, time))
> > return;
> >
> > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
> > static void sugov_work(struct kthread_work *work)
> > {
> > struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
> > + unsigned int freq;
> > + unsigned long flags;
> > +
> > + /*
> > + * Hold sg_policy->update_lock shortly to handle the case where:
> > + * incase sg_policy->next_freq is read here, and then updated by
> > + * sugov_update_shared just before work_in_progress is set to false
> > + * here, we may miss queueing the new update.
> > + *
> > + * Note: If a work was queued after the update_lock is released,
> > + * sugov_work will just be called again by kthread_work code; and the
> > + * request will be proceed before the sugov thread sleeps.
> > + */
> > + raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> > + freq = sg_policy->next_freq;
> > + sg_policy->work_in_progress = false;
> > + raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
> >
> > mutex_lock(&sg_policy->work_lock);
> > - __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
> > - CPUFREQ_RELATION_L);
> > + __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> > mutex_unlock(&sg_policy->work_lock);
> > -
> > - sg_policy->work_in_progress = false;
> > }
>
> And I do see a race here for single policy systems doing slow switching.
>
> Kthread Sched update
>
> sugov_work() sugov_update_single()
>
> lock();
> // The CPU is free to rearrange below
> // two in any order, so it may clear
> // the flag first and then read next
> // freq. Lets assume it does.
> work_in_progress = false
>
> if (work_in_progress)
> return;
>
> sg_policy->next_freq = 0;
> freq = sg_policy->next_freq;
> sg_policy->next_freq = real-next-freq;
> unlock();
>

I agree with the race you describe for single policy slow-switch. Good find :)

The mainline sugov_work could also do such reordering in sugov_work, I think. Even
with the mutex_unlock in mainline's sugov_work, that work_in_progress write could
be reordered by the CPU to happen before the read of next_freq. AIUI,
mutex_unlock is expected to be only a release-barrier.

Although to be safe, I could just put an smp_mb() there. I believe with that,
no locking would be needed for such case.

I'll send out a v3 with Acks for the original patch, and the send out the
smp_mb() as a separate patch if that's Ok.

thanks,

- Joel

Next message: Alex Williamson: "Re: [PATCH 2/2] NTB: PCI Quirk to Enable Switchtec NT Functionality with IOMMU On"
Previous message: kbuild test robot: "Re: [PATCH] mm: fix race between kmem_cache destroy, create and deactivate"
In reply to: Viresh Kumar: "Re: [PATCH v2] schedutil: Allow cpufreq requests to be made even when kthread kicked"
Next in thread: Rafael J. Wysocki: "Re: [PATCH v2] schedutil: Allow cpufreq requests to be made even when kthread kicked"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]