Re: [PATCH 2/2] Ensures correct concurrent opening/reading ofpidlists across pid namespaces

From: Andrew Morton
Date: Thu Jul 02 2009 - 19:54:33 EST


On Thu, 02 Jul 2009 16:26:25 -0700
Paul Menage <menage@xxxxxxxxxx> wrote:

> Ensures correct concurrent opening/reading of pidlists across pid namespaces
>
> Previously there was the problem in which two processes from different pid
> namespaces reading the tasks or procs file could result in one process seeing
> results from the other's namespace. Rather than one pidlist for each file in a
> cgroup, we now keep a list of pidlists keyed by namespace and file type (tasks
> versus procs) in which entries are placed on demand. Each pidlist has its own
> lock, and that the pidlists themselves are passed around in the seq_file's
> private pointer means we don't have to touch the cgroup or its master list
> except when creating and destroying entries.
>
> Signed-off-by: Ben Blum <bblum@xxxxxxxxxx>
> Reviewed-by: Paul Menage <menage@xxxxxxxxxx>
> Signed-off-by: Paul Menage <menage@xxxxxxxxxx>

The way these patches were sent states that you were their primary
author. Is that accurate? If not, they should have had

From: Ben Blum <bblum@xxxxxxxxxx>

at the very top of the changelog.

>
> ...
>
> /**
> + * find the appropriate pidlist for our purpose (given procs vs tasks)
> + * returns with the lock on that pidlist already held, and takes care
> + * of the use count, or returns NULL with no locks held if we're out of
> + * memory.
> + */

Comment purports to be kerneldoc, but isn't.

> +static struct cgroup_pidlist *cgroup_pidlist_find(struct cgroup *cgrp,
> + enum cgroup_filetype type)
> +{
> + struct cgroup_pidlist *l;
> + /* don't need task_nsproxy() if we're looking at ourself */
> + struct pid_namespace *ns = get_pid_ns(current->nsproxy->pid_ns);
> + mutex_lock(&cgrp->pidlist_mutex);
> + list_for_each_entry(l, &cgrp->pidlists, links) {
> + if (l->key.type == type && l->key.ns == ns) {
> + /* found a matching list - drop the extra refcount */
> + put_pid_ns(ns);
> + /* make sure l doesn't vanish out from under us */

This looks fishy.

> + down_write(&l->mutex);
> + mutex_unlock(&cgrp->pidlist_mutex);
> + l->use_count++;
> + return l;

The caller of cgroup_pidlist_find() must ensure that l->use_count > 0,
otherwise cgroup_pidlist_find() cannot safely use `l' - it could be
freed at any time. But if l->use_count > 0, there is no risk of `l'
"vanishing out from under us".

I'm probably wrong there, but that's the usual pattern and this code
looks like it's doing something different. Please check?

> + }
> + }
> + /* entry not found; create a new one */
> + l = kmalloc(sizeof(struct cgroup_pidlist), GFP_KERNEL);
> + if (!l) {
> + mutex_unlock(&cgrp->pidlist_mutex);
> + return l;
> + }
> + init_rwsem(&l->mutex);
> + down_write(&l->mutex);
> + l->key.type = type;
> + l->key.ns = ns;
> + l->use_count = 0; /* don't increment here */
> + l->list = NULL;
> + l->owner = cgrp;
> + list_add(&l->links, &cgrp->pidlists);
> + mutex_unlock(&cgrp->pidlist_mutex);
> + return l;
> +}
> +
>
> ...
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/