Re: [RFC PATCH 2/3] sched: add yield_to function

From: Mike Galbraith
Date: Fri Dec 03 2010 - 09:45:40 EST


On Fri, 2010-12-03 at 19:16 +0530, Srivatsa Vaddagiri wrote:
> On Fri, Dec 03, 2010 at 06:54:16AM +0100, Mike Galbraith wrote:
> > > +void yield_to(struct task_struct *p)
> > > +{
> > > + unsigned long flags;
> > > + struct sched_entity *se = &p->se;
> > > + struct rq *rq;
> > > + struct cfs_rq *cfs_rq;
> > > + u64 remain = slice_remain(current);
> >
> > That "slice remaining" only shows the distance to last preempt, however
> > brief. It shows nothing wrt tree position, the yielding task may well
> > already be right of the task it wants to yield to, having been a buddy.
>
> Good point.
>
> > > cfs_rq = cfs_rq_of(se);
> > > + se->vruntime -= remain;
> > > + if (se->vruntime < cfs_rq->min_vruntime)
> > > + se->vruntime = cfs_rq->min_vruntime;
> >
> > (This is usually done using max_vruntime())
> >
> > If the recipient was already left of the fair stick (min_vruntime),
> > clipping advances it's vruntime, vaporizing entitlement from both donor
> > and recipient.
> >
> > What if a task tries to yield to another not on the same cpu, and/or in
> > the same task group?
>
> In this case, target of yield_to is a vcpu belonging to the same VM and hence is
> expected to be in same task group, but I agree its good to put a check.
>
> > This would munge min_vruntime of other queues. I
> > think you'd have to restrict this to same cpu, same group. If tasks can
> > donate cross cfs_rq, (say) pinned task A cpu A running solo could donate
> > vruntime to selected tasks pinned to cpu B, for as long as minuscule
> > preemptions can resupply ammo. Would suck to not be the favored child.
>
> IOW starving "non-favored" childs?

Yes, as in fairness ceases to exists.

> > Maybe you could exchange vruntimes cooperatively (iff same cfs_rq)
> > between threads, but I don't think donations with clipping works.
>
> Can't that lead to starvation again (as I pointed in a mail to Peterz):
>
> p0 -> A0 B0 A1
>
> A0/A1 enter a yield_to(other) deadlock, which means we keep swapping their
> vruntimes, starving B0?

I'll have to go back and re-read that. Off the top of my head, I see no
way it could matter which container the numbers live in as long as they
keep advancing, and stay in the same runqueue. (hm, task weights would
have to be the same too or scaled. dangerous business, tinkering with
vruntimes)

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/