Re: [PATCH] sched/pid fix use-after free in task_tgid_vnr
From: Eric W. Biederman
Date: Mon Dec 12 2016 - 14:13:43 EST
Oleg Nesterov <oleg@xxxxxxxxxx> writes:
> On 12/10, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@xxxxxxxxxx> writes:
>>
>> > On 12/09, EunTaik Lee wrote:
>> >>
>> >> There is a use-after-free case with below call stack.
>> >>
>> >> pid_nr_ns+0x10/0x38
>> >> cgroup_pidlist_start+0x144/0x400
>> >> cgroup_seqfile_start+0x1c/0x24
>> >> kernfs_seq_start+0x54/0x90
>> >> seq_read+0x15c/0x3a8
>> >> kernfs_fop_read+0x38/0x160
>> >> __vfs_read+0x28/0xc8
>> >> vfs_read+0x84/0xfc
>>
>> How is this a use after free. The function pid_nr_ns should take a NULL pointer
>> as input and return 0?
>
> No, the task (task_struct) itself can't go away, but task->group_leader
> can point to nowhere.
>
>> Certainly if the addtion of pid_alive fixes it pid_vnr(task_tgid(tsk))
>> is fine. Are we perhaps missing rcu locking?
>
> rcu_read_lock() is not enough in this case, see below.
>
>> Or is the problem simply that in task_tgid we are accessing
>> task->group_leader which may already be dead?
>
> Yes. Lets forget about the callchain above, I didn't even bother to verify
> that it can actually hit the problem. Although I think EunTaik is very right,
> css_task_iter_next() does get_task_struct() and drops css_set_lock, the task
> can exit after that. Forget.
>
> Just suppose that a task simply does
>
> pid = task_tgid_vnr(current);
>
> after it has already called exit_notify(). And this is what perf_event_pid()
> does, perhaps we have more buggy users.
>
> In this case current->group_leader or parent/real_parent can point to the
> exited/freed tasks. I already said this many times, ee really need to nullify
> them in __unhash_process() but this needs a lot of (mostly simple)
> cleanups.
Is there anything wrong with starting with the patch below?
diff --git a/kernel/exit.c b/kernel/exit.c
index 9d68c45ebbe3..03daeecc335d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -200,6 +200,7 @@ void release_task(struct task_struct *p)
if (zap_leader)
leader->exit_state = EXIT_DEAD;
}
+ p->group_leader = NULL;
write_unlock_irq(&tasklist_lock);
release_thread(p);
That seems to cut to the heart of the matter. Failures will be clearer,
as will be code that is introduced to handle the situation. Then we
don't need pid_alive or any other magic just a simple:
rcu_read_lock();
leader = READ_ONCE(task->group_leader);
if (leader) {
/* Do stuff */
}
rcu_read_unlock();
>> If so the fix needs to be
>> in task_tgid.
>
> Yes, task_tgid() should probably return NULL in this case, but this connects
> to "a lot of cleanups" above.
But that is important because that is where things go wrong in the
specific case under discussion. pid_nr_ns handles all of the other
cases, it is task_tgid that went wrong.
Eric