Re: [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically

From: Byungchul Park
Date: Tue May 16 2017 - 02:52:40 EST


On Fri, May 12, 2017 at 10:25:30AM -0400, Steven Rostedt wrote:
> On Fri, 12 May 2017 14:48:45 +0900
> Byungchul Park <byungchul.park@xxxxxxx> wrote:
>
> > cpudl.elements is an instance that should be protected with a spin lock.
> > Without it, the code would be insane.
>
> And how much contention will this add? Spin locks in the scheduler code
> that are shared among a domain can cause huge latency. This was why I
> worked hard not to add any in the cpupri code.

Yes. That's also whay I hesitated to post this patch..

> > Current cpudl_find() has problems like,
> >
> > 1. cpudl.elements[0].cpu might not match with cpudl.elements[0].dl.
> > 2. cpudl.elements[0].dl(u64) might not be referred atomically.
> > 3. Two cpudl_maximum()s might return different values.
> > 4. It's just insane.
>
> And lockless algorithms usually are insane. But locks come with a huge
> cost. What happens when we have 32 core domains. This can cause
> tremendous contention and makes the entire cpu priority for deadlines
> useless. Might as well rip out the code.

I think it would be better if we, even keeping it lockless,

1. make cp->elements[].cpu referred once than twice,
2. add retry logic in order to match elements[].cpu with its dl,
3. add retry logic for the u64 variable to be read atomically,

So what do you think about the suggestions? Of course it does not solve
the problems perfectly though.. Or do you think it's not worth?

> I haven't looked too hard into the deadline version, I may have to
> spend some time doing so soon. But unfortunately, I have other critical
> sections to spend brain cycles on.

OK.

Thank you,
Byungchul

>
> -- Steve
>
>
> >
> > Signed-off-by: Byungchul Park <byungchul.park@xxxxxxx>
> > ---
> > kernel/sched/cpudeadline.c | 37 ++++++++++++++++++++++++++++++-------
> > 1 file changed, 30 insertions(+), 7 deletions(-)
> >
> > diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
> > index fba235c..6b67016 100644
> > --- a/kernel/sched/cpudeadline.c
> > +++ b/kernel/sched/cpudeadline.c
> > @@ -131,16 +131,39 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
> > cpumask_and(later_mask, cp->free_cpus, &p->cpus_allowed)) {
> > best_cpu = cpumask_any(later_mask);
> > goto out;
> > - } else if (cpumask_test_cpu(cpudl_maximum(cp), &p->cpus_allowed) &&
> > - dl_time_before(dl_se->deadline, cp->elements[0].dl)) {
> > - best_cpu = cpudl_maximum(cp);
> > - if (later_mask)
> > - cpumask_set_cpu(best_cpu, later_mask);
> > + } else {
> > + u64 cpudl_dl;
> > + int cpudl_cpu;
> > + int cpudl_valid;
> > + unsigned long flags;
> > +
> > + /*
> > + * Referring to cp->elements must be atomic ops.
> > + */
> > + raw_spin_lock_irqsave(&cp->lock, flags);
> > + /*
> > + * No problem even in case of very initial heap tree
> > + * to which no entry has been added yet, since
> > + * cp->elements[0].cpu was initialized to zero and
> > + * cp->elements[0].idx was initialized to IDX_INVALID,
> > + * that means the case will be filtered out at the
> > + * following condition.
> > + */
> > + cpudl_cpu = cpudl_maximum(cp);
> > + cpudl_dl = cp->elements[0].dl;
> > + cpudl_valid = cp->elements[cpudl_cpu].idx;
> > + raw_spin_unlock_irqrestore(&cp->lock, flags);
> > +
> > + if (cpudl_valid != IDX_INVALID &&
> > + cpumask_test_cpu(cpudl_cpu, &p->cpus_allowed) &&
> > + dl_time_before(dl_se->deadline, cpudl_dl)) {
> > + best_cpu = cpudl_cpu;
> > + if (later_mask)
> > + cpumask_set_cpu(best_cpu, later_mask);
> > + }
> > }
> >
> > out:
> > - WARN_ON(best_cpu != -1 && !cpu_present(best_cpu));
> > -
> > return best_cpu;
> > }
> >