Re: [PATCH 1/3] proc: first_tid: fix the potential use-after-free

From: Eric W. Biederman
Date: Wed May 29 2013 - 00:08:52 EST


Oleg Nesterov <oleg@xxxxxxxxxx> writes:

> proc_task_readdir() verifies that the result of get_proc_task()
> is pid_alive() and thus its ->group_leader is fine too. However
> this is not necessarily true after rcu_read_unlock(), we need
> to recheck this after first_tid() does rcu_read_lock() again.

I agree with you but you are missing something critical from your
explanation. If a process has been passed through __unhash_process
then task->thread_group.next (aka next_thread) returns a pointer to the
process that was it's next thread in the thread group. Importantly
that pointer is only guaranteed to point to valid memory until the rcu
grace period expires.

Which means that starting a walk of a thread list with a task that
could have been unhashed before the current rcu critical section
began is invalid, and can lead to following an invalid pointer.

> The race is subtle and unlikely, but still it is possible afaics.
> To simplify lets ignore the "likely" case when tid != 0, f_version
> can be cleared by proc_task_operations->llseek().
>
> Suppose we have a main thread M and its subthread T. Suppose that
> f_pos == 3, iow first_tid() should return T. Now suppose that the
> following happens between rcu_read_unlock() and rcu_read_lock():
>
> 1. T execs and becomes the new leader. This removes M from
> ->thread_group but next_thread(M) is still T.
>
> 2. T creates another thread X which does exec as well, T
> goes away.
>
> 3. X creates another subthread, this increments nr_threads.
>
> 4. first_tid() does next_thread(M) and returns the already
> dead T.
>
> Note that we need 2. and 3. only because of get_nr_threads() check,
> and this check was supposed to be optimization only.

An optimization and denial of service attack prevention. It keeps us
spinning for nearly unbounded amounts of time in the rcu critical
section. But I agree it should not be needed from this part of
correctness.

> Note: I think that proc_task_readdir/first_tid interaction can be
> simplified, but this needs another patch. proc_task_readdir() should
> not play with ->group_leader at all. See the next patches.

That sounds right. I seem to recall that there was a purpose in keeping
the leader pinned but it looks like that purpose is long since gone.

> Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx>
> ---
> fs/proc/base.c | 5 ++++-
> 1 files changed, 4 insertions(+), 1 deletions(-)
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index dd51e50..c939c9f 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -3186,10 +3186,13 @@ static struct task_struct *first_tid(struct task_struct *leader,
> goto found;
> }
>
> - /* If nr exceeds the number of threads there is nothing todo */
> pos = NULL;
> + /* If nr exceeds the number of threads there is nothing todo */

Moving the comment is just noise and makes for confusing reading of your
patch.

> if (nr && nr >= get_nr_threads(leader))
> goto out;
> + /* It could be unhashed before we take rcu lock */
> + if (!pid_alive(leader))
> + goto out;
>
> /* If we haven't found our starting place yet start
> * with the leader and walk nr threads forward.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/