Re: [PATCH v3 1/3] sched: introduce a new migration flag to task_struct

From: Peter Zijlstra
Date: Thu Oct 15 2015 - 07:18:24 EST

Next message: Arnd Bergmann: "Re: [PATCH] VFIO: platform: AMD xgbe reset module"
Previous message: Mikko Rapeli: "Re: [PATCH v4 75/79] include/uapi/xen/privcmd.h: fix compilation in userspace"
In reply to: byungchul.park: "[PATCH v3 1/3] sched: introduce a new migration flag to task_struct"
Next in thread: Byungchul Park: "Re: [PATCH v3 1/3] sched: introduce a new migration flag to task_struct"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Oct 15, 2015 at 06:01:14PM +0900, byungchul.park@xxxxxxx wrote:
> From: Byungchul Park <byungchul.park@xxxxxxx>
>
> This patch removes a weird coupling between se->avg.last_update_time and
> the condition checking for migration, and introduce a new migration flag.
> Now, scheduler can use the flag instead of se->avg.last_update_time to
> check if migration already happened or not.

Was there a problem with that coupling? This does not explain.

> Signed-off-by: Byungchul Park <byungchul.park@xxxxxxx>
> ---
> include/linux/sched.h | 3 +++
> kernel/sched/core.c | 1 +
> kernel/sched/fair.c | 22 ++++++++++++----------
> kernel/sched/sched.h | 1 +
> 4 files changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 699228b..a104c72 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1379,6 +1379,9 @@ struct task_struct {
> #endif
> int on_rq;
>
> + /* For indicating if a migration has happened. */
> + int migrated;

You just created another 4 byte hole instead of filling one.

> int prio, static_prio, normal_prio;
> unsigned int rt_priority;
> const struct sched_class *sched_class;

> +++ b/kernel/sched/fair.c
> @@ -2771,14 +2771,15 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
>
> /* Add the load generated by se into cfs_rq's load average */
> static inline void
> -enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> +enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> {
> struct sched_avg *sa = &se->avg;
> u64 now = cfs_rq_clock_task(cfs_rq);
> - int migrated, decayed;
> + int decayed;
> + int migrated = flags & ENQUEUE_MIGRATED;
> + int created = !sa->last_update_time;
>
> - migrated = !sa->last_update_time;
> - if (!migrated) {
> + if (!migrated && !created) {
> __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
> se->on_rq * scale_load_down(se->load.weight),
> cfs_rq->curr == se, NULL);
> @@ -2789,10 +2790,10 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
> cfs_rq->runnable_load_avg += sa->load_avg;
> cfs_rq->runnable_load_sum += sa->load_sum;
>
> - if (migrated)
> + if (migrated || created)
> attach_entity_load_avg(cfs_rq, se);
>
> - if (decayed || migrated)
> + if (decayed || migrated || created)
> update_tg_load_avg(cfs_rq, 0);
> }

How much extra code gets generated for this? These _are_ hot paths.

> @@ -4136,6 +4137,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> struct cfs_rq *cfs_rq;
> struct sched_entity *se = &p->se;
>
> + flags = flags | (xchg(&p->migrated, 0) ? ENQUEUE_MIGRATED : 0);

Yeah, no way. xchg() is an absurdly expensive instruction, we do not
place that unconditionally in the enqueue path.

> @@ -5021,7 +5023,7 @@ static void migrate_task_rq_fair(struct task_struct *p, int next_cpu)
> remove_entity_load_avg(&p->se);
>
> /* Tell new CPU we are migrated */
> - p->se.avg.last_update_time = 0;
> + p->migrated = 1;
>
> /* We have migrated, no longer consider this task hot */
> p->se.exec_start = 0;
> @@ -8082,7 +8084,7 @@ static void task_move_group_fair(struct task_struct *p)
> set_task_rq(p, task_cpu(p));
>
> #ifdef CONFIG_SMP
> - /* Tell se's cfs_rq has been changed -- migrated */
> + /* Tell se's cfs_rq has been changed */
> p->se.avg.last_update_time = 0;
> #endif
> attach_task_cfs_rq(p);

So my tiny little patch removed more code than it added, and simplified
a few things, like the above. Now we have 2 states to worry about.

How is this making things better?

> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index af6f252..66d0552 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1158,6 +1158,7 @@ static const u32 prio_to_wmult[40] = {
> #define ENQUEUE_WAKING 0
> #endif
> #define ENQUEUE_REPLENISH 8
> +#define ENQUEUE_MIGRATED 16

Won't actually apply that..
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Arnd Bergmann: "Re: [PATCH] VFIO: platform: AMD xgbe reset module"
Previous message: Mikko Rapeli: "Re: [PATCH v4 75/79] include/uapi/xen/privcmd.h: fix compilation in userspace"
In reply to: byungchul.park: "[PATCH v3 1/3] sched: introduce a new migration flag to task_struct"
Next in thread: Byungchul Park: "Re: [PATCH v3 1/3] sched: introduce a new migration flag to task_struct"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]