Re: [PATCH] exit: move exit_task_namespaces() after exit_task_work()
From: Dmitry Vyukov
Date: Fri Dec 15 2017 - 03:01:07 EST
On Fri, Dec 15, 2017 at 8:35 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Fri, Dec 15, 2017 at 7:56 AM, Eric W. Biederman
> <ebiederm@xxxxxxxxxxxx> wrote:
>> Cong Wang <xiyou.wangcong@xxxxxxxxx> writes:
>>
>>> syzbot reported we have a use-after-free when mqueue_evict_inode()
>>> is called on __cleanup_mnt() path, where the ipc ns is already
>>> freed by the previous exit_task_namespaces(). We can just move
>>> it after after exit_task_work() to avoid this use-after-free.
>>
>> How does that possibly work. (I haven't seen this syzbot report).
>>
>> Looking at the code we have get_ns_from_inode. Which takes the mq_lock,
>> sees if the pointer is NULL and takes a reference if it is non-NULL.
>>
>> Meanwhile put_ipc_ns calls mq_clear_sbinfo(ns) with the mq_lock held
>> when the count drops to zero.
>>
>> Where is the race in that?
>>
>> The rest of mqueue_evict_inode uses the returned pointer and
>> tests that the pointer is non-NULL before user it.
>>
>> So either szbot is giving you a bad report or there is a subtle race
>> there I am not seeing. The change below is not at all the proper way to
>> fix a subtle race.
>>
>> Eric
>
> Cong, what was that report? Searching by
> "exit_task_work|exit_task_namespaces" there are too many of them:
> https://groups.google.com/forum/#!searchin/syzkaller-bugs/%22exit_task_work$7Cexit_task_namespaces%22%7Csort:date
>
> I can only say that syzbot does not make up reports. That's something
> that actually happened and was provoked by userspace.
Ah, found that bug:
https://groups.google.com/d/msg/syzkaller-bugs/1XBaqnPSXzs/VF-eCSPuCQAJ
>>> Reported-by: syzbot <syzkaller@xxxxxxxxxxxxxxxx>
>>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>>> Cc: stable@xxxxxxxxxxxxxxx
>>> Signed-off-by: Cong Wang <xiyou.wangcong@xxxxxxxxx>
>>> ---
>>> kernel/exit.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/exit.c b/kernel/exit.c
>>> index 6b4298a41167..909e43c45158 100644
>>> --- a/kernel/exit.c
>>> +++ b/kernel/exit.c
>>> @@ -861,8 +861,8 @@ void __noreturn do_exit(long code)
>>> exit_fs(tsk);
>>> if (group_dead)
>>> disassociate_ctty(1);
>>> - exit_task_namespaces(tsk);
>>> exit_task_work(tsk);
>>> + exit_task_namespaces(tsk);
>>> exit_thread(tsk);
>>>
>>> /*