Re: [PATCH 3/4] threadgroup: extend threadgroup_lock() to coverexit and exec
From: Oleg Nesterov
Date: Sun Sep 18 2011 - 13:41:55 EST
Hello,
Sorry for the late reply.
Of course I am in no position to ack the changes in this code, I do not
fell I understand it enough. But afaics this series is fine.
A couple of questions.
On 09/05, Tejun Heo wrote:
>
> For exec, threadgroup_[un]lock() are updated to also grab and release
> cred_guard_mutex.
OK, this means that we do not need
cgroups-more-safe-tasklist-locking-in-cgroup_attach_proc.patch
http://marc.info/?l=linux-mm-commits&m=131491135428326&w=2
Ben, what do you think?
> With this change, threadgroup_lock() guarantees that the target
> threadgroup will remain stable - no new task will be added, no new
> PF_EXITING will be set and exec won't happen.
To me, this is the only "contradictory" change,
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -936,6 +936,12 @@ NORET_TYPE void do_exit(long code)
> schedule();
> }
>
> + /*
> + * @tsk's threadgroup is going through changes - lock out users
> + * which expect stable threadgroup.
> + */
> + threadgroup_change_begin(tsk);
> +
> exit_irq_thread();
>
> exit_signals(tsk); /* sets PF_EXITING */
> @@ -1018,10 +1024,6 @@ NORET_TYPE void do_exit(long code)
> kfree(current->pi_state_cache);
> #endif
> /*
> - * Make sure we are holding no locks:
> - */
> - debug_check_no_locks_held(tsk);
> - /*
> * We can do this unlocked here. The futex code uses this flag
> * just to verify whether the pi state cleanup has been done
> * or not. In the worst case it loops once more.
> @@ -1039,6 +1041,12 @@ NORET_TYPE void do_exit(long code)
> preempt_disable();
> exit_rcu();
>
> + /*
> + * Release threadgroup and make sure we are holding no locks.
> + */
> + threadgroup_change_done(tsk);
I am wondering, can't we narrow the scope of threadgroup_change_begin/done
in do_exit() path?
The code after 4/4 still has to check PF_EXITING, this is correct. And yes,
with this patch PF_EXITING becomes stable under ->group_rwsem. But, it seems,
we do not really need this?
I mean, can't we change cgroup_exit() to do threadgroup_change_begin/done
instead? We do not really care about PF_EXITING, we only need to ensure that
we can't race with cgroup_exit(), right?
Say, cgroup_attach_proc() does
do {
if (tsk->flags & PF_EXITING)
continue;
flex_array_put_ptr(group, tsk);
} while_each_thread();
Yes, this tsk can call do_exit() and set PF_EXITING right after the check
but this is fine. The only guarantee we need is: if it has already called
cgroup_exit() we can not miss PF_EXITING, and if cgroup_exit() takes the
same sem this should be true. And, otoh, if we do not see PF_EXITING then
we can not race with cgroup_exit(), it should block on ->group_rwsem hold
by us.
If I am right, afaics the only change 4/4 needs is that it should not add
WARN_ON_ONCE(tsk->flags & PF_EXITING) into cgroup_task_migrate().
What do you think?
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/