Re: [PATCH] kernel: Refactor task_struct to use numa_faults instead of numa_* pointers

From: Davidlohr Bueso
Date: Thu Oct 30 2014 - 17:42:23 EST


On Thu, 2014-10-30 at 21:38 +0200, Iulia Manda wrote:
> This patch simplifies task_struct by removing the four numa_* pointers
> in the same array and replacing them with the array pointer. By doing this,
> the size of task_struct is reduced.

This is always welcome, but for completeness you should specify by how
much on <insert your favorite arch>. afaict you're removing 3 ulong
pointers.

> A new parameter is added to the task_faults_idx function so that it can return
> an index to the correct offset, corresponding with the old precalculated
> pointers.
>
> All of the code in sched/ that depended on task_faults_idx and numa_* was
> changed in order to match the new logic.
>
> Signed-off-by: Iulia Manda <iulia.manda21@xxxxxxxxx>

Acked-by: Davidlohr Bueso <dave@xxxxxxxxxxxx>

With some suggestions below.

> ---
> include/linux/sched.h | 40 ++++++++++--------
> kernel/sched/core.c | 3 +-
> kernel/sched/debug.c | 4 +-
> kernel/sched/fair.c | 110 +++++++++++++++++++++++++------------------------
> 4 files changed, 82 insertions(+), 75 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5e344bb..bf8c19f 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -796,6 +796,15 @@ struct sched_info {
> };
> #endif /* defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT) */
>
> +#ifdef CONFIG_NUMA_BALANCING
> +enum numa_faults_stats {
> + NUMA_MEM = 0,
> + NUMA_CPU,
> + NUMA_MEMBUF,
> + NUMA_CPUBUF
> +};
> +#endif
> +
> #ifdef CONFIG_TASK_DELAY_ACCT
> struct task_delay_info {
> spinlock_t lock;
> @@ -1558,28 +1567,23 @@ struct task_struct {
> struct numa_group *numa_group;
>
> /*
> - * Exponential decaying average of faults on a per-node basis.
> - * Scheduling placement decisions are made based on the these counts.
> - * The values remain static for the duration of a PTE scan
> + * numa_faults is an array split into four regions:
> + * faults_memory, faults_cpu, faults_memory_buffer, faults_cpu_buffer
> + * in this precise order.
> + *
> + * faults_memory: Exponential decaying average of faults on a per-node
> + * basis. Scheduling placement decisions are made based on these
> + * counts. The values remain static for the duration of a PTE scan.
> + * faults_cpu: Track the nodes the process was running on when a NUMA
> + * hinting fault was incurred.
> + * faults_memory_buffer and faults_cpu_buffer: Record faults per node
> + * during the current scan window. When the scan completes, the counts
> + * in faults_memory and faults_cpu decay and these values are copied.

How about moving these comments to where you have the enum
numa_faults_stats? And just point to that here.

[...]
> -static inline int task_faults_idx(int nid, int priv)
> +/*
> + * The averaged statistics, shared & private, memory & cpu,
> + * occupy the first half of the array. The second half of the
> + * array is for current counters, which are averaged into the
> + * first set by task_numa_placement.
> + */
> +static inline int task_faults_idx(enum numa_faults_stats s, int nid, int priv)
> {
> - return NR_NUMA_HINT_FAULT_TYPES * nid + priv;
> + return s * NR_NUMA_HINT_FAULT_TYPES * nr_node_ids +
> + NR_NUMA_HINT_FAULT_TYPES * nid + priv;

parenthesis here?

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/