Re: [PATCHv2 5/7] cgroup: introduce cgroup namespaces

From: Eric W. Biederman
Date: Fri Oct 31 2014 - 20:59:18 EST


Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:

> On Fri, Oct 31, 2014 at 12:18 PM, Aditya Kali <adityakali@xxxxxxxxxx> wrote:

<snip>

>> +static void *cgroupns_get(struct task_struct *task)
>> +{
>> + struct cgroup_namespace *ns = NULL;
>> + struct nsproxy *nsproxy;
>> +
>> + rcu_read_lock();
>> + nsproxy = task->nsproxy;
>> + if (nsproxy) {
>> + ns = nsproxy->cgroup_ns;
>> + get_cgroup_ns(ns);
>> + }
>> + rcu_read_unlock();
>
> How is this correct? Other namespaces do it too, so it Must Be
> Correct (tm), but I don't understand. What is RCU protecting?

The code is not correct. The code needs to use task_lock.

RCU used to protect nsproxy, and now task_lock protects nsproxy.
For the reasons of of all of this I refer you to the commit
that changed this, and the comment in nsproxy.h

commit 728dba3a39c66b3d8ac889ddbe38b5b1c264aec3
Author: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
Date: Mon Feb 3 19:13:49 2014 -0800

namespaces: Use task_lock and not rcu to protect nsproxy

The synchronous syncrhonize_rcu in switch_task_namespaces makes setns
a sufficiently expensive system call that people have complained.

Upon inspect nsproxy no longer needs rcu protection for remote reads.
remote reads are rare. So optimize for same process reads and write
by switching using rask_lock instead.

This yields a simpler to understand lock, and a faster setns system call.

In particular this fixes a performance regression observed
by Rafael David Tinoco <rafael.tinoco@xxxxxxxxxxxxx>.

This is effectively a revert of Pavel Emelyanov's commit
cf7b708c8d1d7a27736771bcf4c457b332b0f818 Make access to task's nsproxy lighter
from 2007. The race this originialy fixed no longer exists as
do_notify_parent uses task_active_pid_ns(parent) instead of
parent->nsproxy.

Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/