Re: [RFC PATCH 06/16] sched: Add core wide task selection and scheduling.
From: Joel Fernandes
Date: Mon Jul 06 2020 - 13:37:15 EST
Hi Vineeth,
On Mon, Jul 06, 2020 at 10:38:27AM -0400, Vineeth Remanan Pillai wrote:
> On Mon, Jul 6, 2020 at 10:09 AM Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> >
> > > I am not sure if this can happen. If the other sibling sets core_pick, it
> > > will be under the core wide lock and it should set the core_sched_seq also
> > > before releasing the lock. So when this cpu tries, it would see the core_pick
> > > before resetting it. Is this the same case you were mentioning? Sorry if I
> > > misunderstood the case you mentioned..
> >
> > If you have a case where you have 3 siblings all trying to enter the schedule
> > loop. Call them A, B and C.
> >
> > A picks something for B in core_pick. Now C comes and resets B's core_pick
> > before running the mega-loop, hoping to select something for it shortly.
> > However, C then does an unconstrained pick and forgets to set B's pick to
> > something.
> >
> > I don't know if this can really happen - but this is why I added the warning
> > in the end of the patch. I think we should make the code more robust and
> > handle these kind of cases.
> >
> I don't think this can happen. Each of the sibling takes the core wide
> lock before calling into pick_next _task. So this should not happen.
So my patch is correct but the warnings I added were probably overkill.
About the warnings, Vineeth explained to me on IRC that the design was
intially done to set ->core_pick to NULL if nothing is being picked for a
sibling rq, and the fact that we don't increment that rq's core_sched_seq
means it would the rq it is being set for would not go read core_pick.
And that resetting ->core_pick should be ok, since a sibling will go select a
task for itself if its core_pick was NULL anyway.
The only requirement is that the selection code definitely select something
for the current CPU, or idle. NULL is not an option,
So I guess we can drop the additional warnings I added, I was likely too
paranoid.
> > Again, it is about making the code more robust. Why should not set
> > rq->core_pick when we pick something? As we discussed in the private
> > discussion - we should make the code robust and consistent. Correctness is
> > not enough, the code has to be robust and maintainable.
> >
> > I think in our private discussion, you agreed with me that there is no harm
> > in setting core_pick in this case.
> >
> I agreed there was no harm, because we wanted to use that in the last
> check after 'done' label. But now I see that adding that check after
> done label cause the WARN_ON to fire even in valid case. Firing the
> WARN_ON in valid case is not good. So, if that WARN_ON check can be
> removed, adding this is not necessary IMHO.
Makes sense.
> > > cpumask_copy(&select_mask, cpu_smt_mask(cpu));
> > > if (unlikely(cpumask_test_cpu(cpu, &select_mask))) {
> > > cpumask_set_cpu(cpu, &select_mask);
> > > need_sync = false;
> > > }
> >
> > Nah, more lines of code for no good no reason, plus another branch right? I'd
> > like to leave my one liner alone than adding 4 more lines :-)
> >
> Remember, this is the fast path. Every schedule() except for our sync
> IPI reaches here. And we are sure that smt_cpumask will not have cpu
> only on hotplug cases which is very rare. I feel adding more code to
> make it clear that this setting is not needed always and also optimizing for
> the fast path is what I was looking for.
It occurs to us that may we want to optimize this a bit more, because we have
to copy cpumask every schedule() with my patch which may be unnecessarily
expensive for large CPU systems. I think we can do better -- probably by
unconditionally running the selection code on the current CPU without first
preparing an intermediate mask..
> > As discussed above, > 2 SMT case, we don't really know if the warning will
> > fire or not. I would rather keep the warning just in case for the future.
> >
> I think I was not clear last time. This WARN_ON will fire on valid cases
> if you have this check here. As I mentioned unconstrained pick, picks only
> for that cpu and not to any other siblings. This is by design. So for
> unconstrained pick, core_pick of all siblings will be NULL. We jump to done
> label on unconstrained pick and this for loop goes through all the siblings
> and finds that its core_pick is not set. Then thei WARN_ON will fire. I have
> reproduced this. We do not want it to fire as it is the correct logic not to
> set core_pick for unconstrained pick. Please let me know if this is not clear.
Agreed, I think my patch can be used as a starting point and we optimize it
further.
Me/Vineeth will continue to work on this and come up with a final patch, thanks!
- Joel