Re: [PATCH 7/8] cgroups: Add a task counter subsystem

From: Frederic Weisbecker
Date: Thu Aug 04 2011 - 10:05:39 EST

Next message: Vivek Goyal: "Re: fio posixaio performance problem"
Previous message: Paul E. McKenney: "Re: 3.0-git15 Atomic scheduling in pidmap_init"
In reply to: Andrew Morton: "Re: [PATCH 7/8] cgroups: Add a task counter subsystem"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Aug 01, 2011 at 04:13:47PM -0700, Andrew Morton wrote:
> On Fri, 29 Jul 2011 18:13:29 +0200
> Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
>
> > Add a new subsystem to limit the number of running tasks,
> > similar to the NR_PROC rlimit but in the scope of a cgroup.
> >
> > This is a step to be able to isolate a bit more a cgroup against
> > the rest of the system and limit the global impact of a fork bomb
> > inside a given cgroup.
> >
> > ...
> >
> > +config CGROUP_TASK_COUNTER
> > + bool "Control number of tasks in a cgroup"
> > + depends on RESOURCE_COUNTERS
> > + help
> > + This option let the user to set up an upper bound allowed number
> > + of tasks inside a cgroup.
>
> whitespace went weird.

Yep, will fix.

> >
> > ...
> >
> +
> > +static void task_counter_post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp)
> > +{
> > + res_counter_inherit(cgroup_task_counter_res(cgrp), RES_LIMIT);
>
> cgroup_task_counter_res() has code in it to carefully return NULL in
> one situation, but if it does this, res_counter_inherit() will then
> cheerily oops. This makes no sense.

Right but the only cgroup for which it returns NULL is the root cgroup.
But we don't post clone the root cgroup itself since it has no parent.

So this can't happen, but I can still add a warn_on condition that escapes.

> > +}
> > +
> >
> > ...
> >
> > +/* Protected amongst can_attach_task/attach_task/cancel_attach_task by cgroup mutex */
> > +static struct res_counter *common_ancestor;
> > +
> > +static int task_counter_can_attach_task(struct cgroup *cgrp, struct cgroup *old_cgrp,
> > + struct task_struct *tsk)
> > +{
> > + struct res_counter *res = cgroup_task_counter_res(cgrp);
> > + struct res_counter *old_res = cgroup_task_counter_res(old_cgrp);
> > + struct res_counter *limit_fail_at;
> > +
> > + common_ancestor = res_counter_common_ancestor(res, old_res);
>
> This might oops too?

Nope, if either res or old_res is NULL, then the common ancestor returned
is NULL. Afterward the charge_until() below will simply charge res over
all the hierarchy if old_res is NULL, or it will do nothing is res itself
is NULL.

I should probably comment on that behaviour.

>
> > + return res_counter_charge_until(res, common_ancestor, 1, &limit_fail_at);
> > +}
> > +
> >
> > ...
> >
> > +int cgroup_task_counter_fork(struct task_struct *child)
> > +{
> > + struct cgroup_subsys_state *css = child->cgroups->subsys[tasks_subsys_id];
> > + struct cgroup *cgrp = css->cgroup;
> > + struct res_counter *limit_fail_at;
> > +
> > + /* Optimize for the root cgroup case, which doesn't have a limit */
> > + if (!cgrp->parent)
> > + return 0;
> > +
> > + return res_counter_charge(cgroup_task_counter_res(cgrp), 1, &limit_fail_at);
> > +}
>
> It took a while for me to work out the meaning of the return value from
> this function. Some documentation would be nice?

Yes and moreover I'm not at all sure about the default return value in
case of failure. -ENOMEM probably matches the need for memory limit
subsystem but for that task counter subsystem.

Probably the res_counter API should return -1 in case of limit reached
and let the caller subsystem deal with the error to return. -ENOMEM
is already too partial.

I guess we should return -EINVAL in case of task counter limit reached?

Once we agree on this I'll document it.

>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Vivek Goyal: "Re: fio posixaio performance problem"
Previous message: Paul E. McKenney: "Re: 3.0-git15 Atomic scheduling in pidmap_init"
In reply to: Andrew Morton: "Re: [PATCH 7/8] cgroups: Add a task counter subsystem"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]