Re: [PATCH 3/4] cgroup: remove sane_behavior support on non-default hierarchies

From: Vivek Goyal
Date: Tue Jul 08 2014 - 15:57:57 EST


On Wed, Jul 02, 2014 at 07:45:46PM -0400, Tejun Heo wrote:
> sane_behavior has been used as a development vehicle for the default
> unified hierarchy. Now that the default hierarchy is in place, the
> flag became redundant and confusing as its usage is allowed on all
> hierarchies. There are gonna be either the default hierarchy or
> legacy ones. Let's make that clear by removing sane_behavior support
> on non-default hierarchies.
>
> This patch replaces cgroup_sane_behavior() with cgroup_on_dfl(). The
> comment on top of CGRP_ROOT_SANE_BEHAVIOR is moved to on top of
> cgroup_on_dfl() with sane_behavior specific part dropped.
>
> On the default and legacy hierarchies w/o sane_behavior, this
> shouldn't cause any behavior differences.
>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
> Cc: Vivek Goyal <vgoyal@xxxxxxxxxx>
> Cc: Li Zefan <lizefan@xxxxxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxx>
> ---
> block/blk-throttle.c | 6 +--

blk-throttle change looks good to me.

Acked-by: Vivek Goyal <vgoyal@xxxxxxxxxx>

Vivek

> include/linux/cgroup.h | 125 +++++++++++++++++++++----------------------------
> kernel/cgroup.c | 19 ++++----
> kernel/cpuset.c | 33 ++++++-------
> mm/memcontrol.c | 7 +--
> 5 files changed, 86 insertions(+), 104 deletions(-)
>
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 3fdb21a..9273d09 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -412,13 +412,13 @@ static void throtl_pd_init(struct blkcg_gq *blkg)
> int rw;
>
> /*
> - * If sane_hierarchy is enabled, we switch to properly hierarchical
> + * If on the default hierarchy, we switch to properly hierarchical
> * behavior where limits on a given throtl_grp are applied to the
> * whole subtree rather than just the group itself. e.g. If 16M
> * read_bps limit is set on the root group, the whole system can't
> * exceed 16M for the device.
> *
> - * If sane_hierarchy is not enabled, the broken flat hierarchy
> + * If not on the default hierarchy, the broken flat hierarchy
> * behavior is retained where all throtl_grps are treated as if
> * they're all separate root groups right below throtl_data.
> * Limits of a group don't interact with limits of other groups
> @@ -426,7 +426,7 @@ static void throtl_pd_init(struct blkcg_gq *blkg)
> */
> parent_sq = &td->service_queue;
>
> - if (cgroup_sane_behavior(blkg->blkcg->css.cgroup) && blkg->parent)
> + if (cgroup_on_dfl(blkg->blkcg->css.cgroup) && blkg->parent)
> parent_sq = &blkg_to_tg(blkg->parent)->service_queue;
>
> throtl_service_queue_init(&tg->service_queue, parent_sq);
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 7748e5b..46e4661 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -248,68 +248,7 @@ struct cgroup {
>
> /* cgroup_root->flags */
> enum {
> - /*
> - * Unfortunately, cgroup core and various controllers are riddled
> - * with idiosyncrasies and pointless options. The following flag,
> - * when set, will force sane behavior - some options are forced on,
> - * others are disallowed, and some controllers will change their
> - * hierarchical or other behaviors.
> - *
> - * The set of behaviors affected by this flag are still being
> - * determined and developed and the mount option for this flag is
> - * prefixed with __DEVEL__. The prefix will be dropped once we
> - * reach the point where all behaviors are compatible with the
> - * planned unified hierarchy, which will automatically turn on this
> - * flag.
> - *
> - * The followings are the behaviors currently affected this flag.
> - *
> - * - Mount options "noprefix", "xattr", "clone_children",
> - * "release_agent" and "name" are disallowed.
> - *
> - * - When mounting an existing superblock, mount options should
> - * match.
> - *
> - * - Remount is disallowed.
> - *
> - * - rename(2) is disallowed.
> - *
> - * - "tasks" is removed. Everything should be at process
> - * granularity. Use "cgroup.procs" instead.
> - *
> - * - "cgroup.procs" is not sorted. pids will be unique unless they
> - * got recycled inbetween reads.
> - *
> - * - "release_agent" and "notify_on_release" are removed.
> - * Replacement notification mechanism will be implemented.
> - *
> - * - "cgroup.clone_children" is removed.
> - *
> - * - "cgroup.subtree_populated" is available. Its value is 0 if
> - * the cgroup and its descendants contain no task; otherwise, 1.
> - * The file also generates kernfs notification which can be
> - * monitored through poll and [di]notify when the value of the
> - * file changes.
> - *
> - * - If mount is requested with sane_behavior but without any
> - * subsystem, the default unified hierarchy is mounted.
> - *
> - * - cpuset: tasks will be kept in empty cpusets when hotplug happens
> - * and take masks of ancestors with non-empty cpus/mems, instead of
> - * being moved to an ancestor.
> - *
> - * - cpuset: a task can be moved into an empty cpuset, and again it
> - * takes masks of ancestors.
> - *
> - * - memcg: use_hierarchy is on by default and the cgroup file for
> - * the flag is not created.
> - *
> - * - blkcg: blk-throttle becomes properly hierarchical.
> - *
> - * - debug: disallowed on the default hierarchy.
> - */
> - CGRP_ROOT_SANE_BEHAVIOR = (1 << 0),
> -
> + CGRP_ROOT_SANE_BEHAVIOR = (1 << 0), /* __DEVEL__sane_behavior specified */
> CGRP_ROOT_NOPREFIX = (1 << 1), /* mounted subsystems have no named prefix */
> CGRP_ROOT_XATTR = (1 << 2), /* supports extended attributes */
> };
> @@ -523,20 +462,64 @@ struct cftype {
> extern struct cgroup_root cgrp_dfl_root;
> extern struct css_set init_css_set;
>
> +/**
> + * cgroup_on_dfl - test whether a cgroup is on the default hierarchy
> + * @cgrp: the cgroup of interest
> + *
> + * The default hierarchy is the v2 interface of cgroup and this function
> + * can be used to test whether a cgroup is on the default hierarchy for
> + * cases where a subsystem should behave differnetly depending on the
> + * interface version.
> + *
> + * The set of behaviors which change on the default hierarchy are still
> + * being determined and the mount option is prefixed with __DEVEL__.
> + *
> + * List of changed behaviors:
> + *
> + * - Mount options "noprefix", "xattr", "clone_children", "release_agent"
> + * and "name" are disallowed.
> + *
> + * - When mounting an existing superblock, mount options should match.
> + *
> + * - Remount is disallowed.
> + *
> + * - rename(2) is disallowed.
> + *
> + * - "tasks" is removed. Everything should be at process granularity. Use
> + * "cgroup.procs" instead.
> + *
> + * - "cgroup.procs" is not sorted. pids will be unique unless they got
> + * recycled inbetween reads.
> + *
> + * - "release_agent" and "notify_on_release" are removed. Replacement
> + * notification mechanism will be implemented.
> + *
> + * - "cgroup.clone_children" is removed.
> + *
> + * - "cgroup.subtree_populated" is available. Its value is 0 if the cgroup
> + * and its descendants contain no task; otherwise, 1. The file also
> + * generates kernfs notification which can be monitored through poll and
> + * [di]notify when the value of the file changes.
> + *
> + * - cpuset: tasks will be kept in empty cpusets when hotplug happens and
> + * take masks of ancestors with non-empty cpus/mems, instead of being
> + * moved to an ancestor.
> + *
> + * - cpuset: a task can be moved into an empty cpuset, and again it takes
> + * masks of ancestors.
> + *
> + * - memcg: use_hierarchy is on by default and the cgroup file for the flag
> + * is not created.
> + *
> + * - blkcg: blk-throttle becomes properly hierarchical.
> + *
> + * - debug: disallowed on the default hierarchy.
> + */
> static inline bool cgroup_on_dfl(const struct cgroup *cgrp)
> {
> return cgrp->root == &cgrp_dfl_root;
> }
>
> -/*
> - * See the comment above CGRP_ROOT_SANE_BEHAVIOR for details. This
> - * function can be called as long as @cgrp is accessible.
> - */
> -static inline bool cgroup_sane_behavior(const struct cgroup *cgrp)
> -{
> - return cgrp->root->flags & CGRP_ROOT_SANE_BEHAVIOR;
> -}
> -
> /* no synchronization, the result can only be used as a hint */
> static inline bool cgroup_has_tasks(struct cgroup *cgrp)
> {
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index be701d9..2d7057e 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -1414,8 +1414,8 @@ static int cgroup_remount(struct kernfs_root *kf_root, int *flags, char *data)
> struct cgroup_sb_opts opts;
> unsigned int added_mask, removed_mask;
>
> - if (root->flags & CGRP_ROOT_SANE_BEHAVIOR) {
> - pr_err("sane_behavior: remount is not allowed\n");
> + if (root == &cgrp_dfl_root) {
> + pr_err("remount is not allowed\n");
> return -EINVAL;
> }
>
> @@ -2833,9 +2833,9 @@ static int cgroup_rename(struct kernfs_node *kn, struct kernfs_node *new_parent,
>
> /*
> * This isn't a proper migration and its usefulness is very
> - * limited. Disallow if sane_behavior.
> + * limited. Disallow on the default hierarchy.
> */
> - if (cgroup_sane_behavior(cgrp))
> + if (cgroup_on_dfl(cgrp))
> return -EPERM;
>
> /*
> @@ -2921,7 +2921,7 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cftype cfts[],
> /* does cft->flags tell us to skip this file on @cgrp? */
> if ((cft->flags & CFTYPE_ONLY_ON_DFL) && !cgroup_on_dfl(cgrp))
> continue;
> - if ((cft->flags & CFTYPE_INSANE) && cgroup_sane_behavior(cgrp))
> + if ((cft->flags & CFTYPE_INSANE) && cgroup_on_dfl(cgrp))
> continue;
> if ((cft->flags & CFTYPE_NOT_ON_ROOT) && !cgroup_parent(cgrp))
> continue;
> @@ -3654,8 +3654,9 @@ after:
> *
> * All this extra complexity was caused by the original implementation
> * committing to an entirely unnecessary property. In the long term, we
> - * want to do away with it. Explicitly scramble sort order if
> - * sane_behavior so that no such expectation exists in the new interface.
> + * want to do away with it. Explicitly scramble sort order if on the
> + * default hierarchy so that no such expectation exists in the new
> + * interface.
> *
> * Scrambling is done by swapping every two consecutive bits, which is
> * non-identity one-to-one mapping which disturbs sort order sufficiently.
> @@ -3670,7 +3671,7 @@ static pid_t pid_fry(pid_t pid)
>
> static pid_t cgroup_pid_fry(struct cgroup *cgrp, pid_t pid)
> {
> - if (cgroup_sane_behavior(cgrp))
> + if (cgroup_on_dfl(cgrp))
> return pid_fry(pid);
> else
> return pid;
> @@ -3773,7 +3774,7 @@ static int pidlist_array_load(struct cgroup *cgrp, enum cgroup_filetype type,
> css_task_iter_end(&it);
> length = n;
> /* now sort & (if procs) strip out duplicates */
> - if (cgroup_sane_behavior(cgrp))
> + if (cgroup_on_dfl(cgrp))
> sort(array, length, sizeof(pid_t), fried_cmppid, NULL);
> else
> sort(array, length, sizeof(pid_t), cmppid, NULL);
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index f6b33c6..f9d4807 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -1383,12 +1383,9 @@ static int cpuset_can_attach(struct cgroup_subsys_state *css,
>
> mutex_lock(&cpuset_mutex);
>
> - /*
> - * We allow to move tasks into an empty cpuset if sane_behavior
> - * flag is set.
> - */
> + /* allow moving tasks into an empty cpuset if on default hierarchy */
> ret = -ENOSPC;
> - if (!cgroup_sane_behavior(css->cgroup) &&
> + if (!cgroup_on_dfl(css->cgroup) &&
> (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed)))
> goto out_unlock;
>
> @@ -2030,7 +2027,7 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs)
> static cpumask_t off_cpus;
> static nodemask_t off_mems;
> bool is_empty;
> - bool sane = cgroup_sane_behavior(cs->css.cgroup);
> + bool on_dfl = cgroup_on_dfl(cs->css.cgroup);
>
> retry:
> wait_event(cpuset_attach_wq, cs->attach_in_progress == 0);
> @@ -2054,12 +2051,12 @@ retry:
> mutex_unlock(&callback_mutex);
>
> /*
> - * If sane_behavior flag is set, we need to update tasks' cpumask
> - * for empty cpuset to take on ancestor's cpumask. Otherwise, don't
> - * call update_tasks_cpumask() if the cpuset becomes empty, as
> - * the tasks in it will be migrated to an ancestor.
> + * If on_dfl, we need to update tasks' cpumask for empty cpuset to
> + * take on ancestor's cpumask. Otherwise, don't call
> + * update_tasks_cpumask() if the cpuset becomes empty, as the tasks
> + * in it will be migrated to an ancestor.
> */
> - if ((sane && cpumask_empty(cs->cpus_allowed)) ||
> + if ((on_dfl && cpumask_empty(cs->cpus_allowed)) ||
> (!cpumask_empty(&off_cpus) && !cpumask_empty(cs->cpus_allowed)))
> update_tasks_cpumask(cs);
>
> @@ -2068,12 +2065,12 @@ retry:
> mutex_unlock(&callback_mutex);
>
> /*
> - * If sane_behavior flag is set, we need to update tasks' nodemask
> - * for empty cpuset to take on ancestor's nodemask. Otherwise, don't
> - * call update_tasks_nodemask() if the cpuset becomes empty, as
> - * the tasks in it will be migratd to an ancestor.
> + * If on_dfl, we need to update tasks' nodemask for empty cpuset to
> + * take on ancestor's nodemask. Otherwise, don't call
> + * update_tasks_nodemask() if the cpuset becomes empty, as the
> + * tasks in it will be migratd to an ancestor.
> */
> - if ((sane && nodes_empty(cs->mems_allowed)) ||
> + if ((on_dfl && nodes_empty(cs->mems_allowed)) ||
> (!nodes_empty(off_mems) && !nodes_empty(cs->mems_allowed)))
> update_tasks_nodemask(cs);
>
> @@ -2083,13 +2080,13 @@ retry:
> mutex_unlock(&cpuset_mutex);
>
> /*
> - * If sane_behavior flag is set, we'll keep tasks in empty cpusets.
> + * If on_dfl, we'll keep tasks in empty cpusets.
> *
> * Otherwise move tasks to the nearest ancestor with execution
> * resources. This is full cgroup operation which will
> * also call back into cpuset. Should be done outside any lock.
> */
> - if (!sane && is_empty)
> + if (!on_dfl && is_empty)
> remove_tasks_in_empty_cpuset(cs);
> }
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a2c7bcb..f986671 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -7001,16 +7001,17 @@ static void mem_cgroup_move_task(struct cgroup_subsys_state *css,
>
> /*
> * Cgroup retains root cgroups across [un]mount cycles making it necessary
> - * to verify sane_behavior flag on each mount attempt.
> + * to verify whether we're attached to the default hierarchy on each mount
> + * attempt.
> */
> static void mem_cgroup_bind(struct cgroup_subsys_state *root_css)
> {
> /*
> - * use_hierarchy is forced with sane_behavior. cgroup core
> + * use_hierarchy is forced on the default hierarchy. cgroup core
> * guarantees that @root doesn't have any children, so turning it
> * on for the root memcg is enough.
> */
> - if (cgroup_sane_behavior(root_css->cgroup))
> + if (cgroup_on_dfl(root_css->cgroup))
> mem_cgroup_from_css(root_css)->use_hierarchy = true;
> }
>
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe cgroups" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/