Re: [PATCH v6 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group

From: Peter Zijlstra
Date: Thu Jun 16 2016 - 16:07:25 EST

Next message: Bjorn Helgaas: "Re: hfi1 use of PCI internals"
Previous message: Tejun Heo: "Re: [PATCH] mm: memcontrol: fix cgroup creation failure after many small jobs"
In reply to: Vincent Guittot: "Re: [PATCH v6 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group"
Next in thread: Vincent Guittot: "Re: [PATCH v6 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Jun 16, 2016 at 09:00:57PM +0200, Vincent Guittot wrote:
> On 16 June 2016 at 20:51, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Thu, Jun 16, 2016 at 06:30:13PM +0200, Vincent Guittot wrote:
> >> With patch [1] for the init of cfs_rq side, all use cases will be
> >> covered regarding the issue linked to a last_update_time set to 0 at
> >> init
> >> [1] https://lkml.org/lkml/2016/5/30/508
> >
> > Aah, wait, now I get it :-)
> >
> > Still, we should put cfs_rq_clock_task(cfs_rq) in it, not 1. And since
> > we now acquire rq->lock on init this should well be possible. Lemme sort
> > that.
>
> yes with the rq->lock we can use cfs_rq_clock_task which is make more
> sense than 1.
> But the delta can be still significant between the creation of the
> task group and the 1st task that will be attach to the cfs_rq

Ah, I think I've spotted more fail.

And I think you're right, it doesn't matter, in fact, 0 should have been
fine too!

enqueue_entity()
enqueue_entity_load_avg()
update_cfs_rq_load_avg()
now = clock()
__update_load_avg(&cfs_rq->avg)
cfs_rq->avg.last_load_update = now
// ages 0 load/util for: now - 0
if (migrated)
attach_entity_load_avg()
se->avg.last_load_update = cfs_rq->avg.last_load_update; // now != 0

So I don't see how it can end up being attached again.

Now I do see another problem, and that is that we're forgetting to
update_cfs_rq_load_avg() in all detach_entity_load_avg() callers and all
but the enqueue caller of attach_entity_load_avg().

Something like the below.

---
kernel/sched/fair.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f75930bdd326..5d8fa135bbc5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8349,6 +8349,7 @@ static void detach_task_cfs_rq(struct task_struct *p)
{
struct sched_entity *se = &p->se;
struct cfs_rq *cfs_rq = cfs_rq_of(se);
+ u64 now = cfs_rq_clock_task(cfs_rq);

if (!vruntime_normalized(p)) {
/*
@@ -8360,6 +8361,7 @@ static void detach_task_cfs_rq(struct task_struct *p)
}

/* Catch up with the cfs_rq and remove our load when we leave */
+ update_cfs_rq_load_avg(now, cfs_rq, false);
detach_entity_load_avg(cfs_rq, se);
}

@@ -8367,6 +8369,7 @@ static void attach_task_cfs_rq(struct task_struct *p)
{
struct sched_entity *se = &p->se;
struct cfs_rq *cfs_rq = cfs_rq_of(se);
+ u64 now = cfs_rq_clock_task(cfs_rq);

#ifdef CONFIG_FAIR_GROUP_SCHED
/*
@@ -8377,6 +8380,7 @@ static void attach_task_cfs_rq(struct task_struct *p)
#endif

/* Synchronize task with its cfs_rq */
+ update_cfs_rq_load_avg(now, cfs_rq, false);
attach_entity_load_avg(cfs_rq, se);

if (!vruntime_normalized(p))

Next message: Bjorn Helgaas: "Re: hfi1 use of PCI internals"
Previous message: Tejun Heo: "Re: [PATCH] mm: memcontrol: fix cgroup creation failure after many small jobs"
In reply to: Vincent Guittot: "Re: [PATCH v6 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group"
Next in thread: Vincent Guittot: "Re: [PATCH v6 1/4] sched/fair: Fix attaching task sched avgs twice when switching to fair or changing task group"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]