Re: [RFC][PATCH 1/3] sched: Rewrite tg_shares_up

From: Peter Zijlstra
Date: Mon Aug 30 2010 - 13:53:20 EST

Next message: Eric Van Hensbergen: "[GIT PULL] 9p file system bug fixes for 2.6.36-rc4"
Previous message: Johannes Berg: "Re: [PATCH] wireless: fix 64K kernel heap content leak via ioctl"
In reply to: Srivatsa Vaddagiri: "Re: [RFC][PATCH 1/3] sched: Rewrite tg_shares_up"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 2010-08-30 at 22:50 +0530, Srivatsa Vaddagiri wrote:
> On Sun, Aug 29, 2010 at 12:30:26AM +0200, Peter Zijlstra wrote:
> > By tracking a per-cpu load-avg for each cfs_rq and folding it into a
> > global task_group load on each tick we can rework tg_shares_up to be
> > strictly per-cpu.
>
> So tg->load_weight is supposed to represent more or less current task load
> across all cpus? I see only atomic_add() to it - which means it can only keep
> growing or remain constant - IOW capturing the historical load even since
> the task group was started? I was expecting it to reduce based on how a group
> goes idle, otherwise

It can add a negative value, I should probably have made the below use
long instead of unsigned long.

> > +static void update_cfs_shares(struct cfs_rq *cfs_rq)
> > +{
> > + struct task_group *tg;
> > + struct sched_entity *se;
> > + unsigned long load_weight, load, shares;
> > +
> > + if (!cfs_rq)
> > + return;
> > +
> > + tg = cfs_rq->tg;
> > + se = tg->se[cpu_of(rq_of(cfs_rq))];
> > + if (!se)
> > + return;
> > +
> > + load = cfs_rq->load.weight;
> > +
> > + load_weight = atomic_read(&tg->load_weight);
> > + load_weight -= cfs_rq->load_contribution;
> > + load_weight += load;
> > +
> > + shares = (tg->shares * load);
> > + if (load_weight)
> > + shares /= load_weight;
>
> this seem incorrect? Even though we have corrected tg->load_weight to reflect
> current load on 'cfs_rq', tg->load_weight still captures historical load on
> other cpus and hence could be a large #, making the division inaccurate?

Well, without history you can get terribly inaccurate too. Suppose the
workload sleeps for 50ms then runs for 50ms with n tasks, each on its
own cpu. Without history you'll end up being off by n*load, with history
you'll be floating about the avg, giving a much more stable result.

The (tg->shares * cfs_rq->load) / tg->load < MIN_SHARES issues is not
something we can easily fix, there's only so much you can do with fixed
point math.

> Also I wonder how much of a hot spot tg->load_weight would become ..

We touch it once per tick, so pretty hot on large systems, then again,
the larger the system the less likely all cpus will run tasks from the
same cgroup.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Eric Van Hensbergen: "[GIT PULL] 9p file system bug fixes for 2.6.36-rc4"
Previous message: Johannes Berg: "Re: [PATCH] wireless: fix 64K kernel heap content leak via ioctl"
In reply to: Srivatsa Vaddagiri: "Re: [RFC][PATCH 1/3] sched: Rewrite tg_shares_up"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]