Re: [PATCH] 9p: fix Use-After-Free in p9_write_work()

From: Tomas Bortoli
Date: Mon Jul 30 2018 - 05:46:27 EST


On 07/30/2018 01:33 AM, Dominique Martinet wrote:
> Tomas Bortoli wrote on Sun, Jul 29, 2018:
>> There is a race condition between p9_free_req() and p9_write_work().
>> A request might still need to be processed while p9_free_req() is called.
>>
>> To fix it, flush the read/write work before freeing any request.
>>
>> Signed-off-by: Tomas Bortoli <tomasbortoli@xxxxxxxxx>
>> Reported-by: syzbot+467050c1ce275af2a5b8@xxxxxxxxxxxxxxxxxxxxxxxxx
>
> It looks like I have not received this report, I found it through google
> in the lkml archives but Dmitry do you have a convenient-ish way of
> finding the report on the syzkaller website with that reported-by tag?
>
>> ---
>>
>> To be able to flush the r/w work from client.c we need the p9_conn and
>> p9_trans_fd definitions. Therefore this commit moves most of the declarations in
>> trans_fd.c to trans_fd.h and import such file in client.c
>
> This cannot work as it is, because you're not just intorudcing the
> trans_fd types but you're really depending on the transport used being
> fd.
> 'conn.wq' won't even be valid memory in other transports so I don't want
> to know what trying to flush_work on this will do... :)
>

Yep, Oops

>
> Other transports also have the same issue see discussion in
> https://lkml.org/lkml/2018/7/19/727
> (that is another syzbot report, slightly different but I believe it
> points to the same issue)
>
> Basically, a more global view of the problem is a race between
> p9_tag_lookup returning a p9_req_t and another thread freeing it.
>
> Matthew wrote the problem himself in a comment in p9_tag_lookup in his new
> version that used to be in linux-next at the time (I took the commit out
> temporarily until I've had time to benchmark it, but it will come back in,
> just you're working on thin air right now because the bug was only found
> thanks to this commit):
> + /* There's no refcount on the req; a malicious server could
> cause
> + * us to dereference a NULL pointer
> + */
>
> So a more proper solution would be to had a refcount to req, have
> p9_tag_lookup increment the refcount within rcu_read_lock, and have a
> deref function free the req when the count hits 0.
>
>

Which commit ? that's a comment.

That sound like the proper solution. Let's do it that way then.

> Now we're here though I'm not sure what to suggest, I had promised to
> get some performance benchmark out by this past weekend and I obviously
> failed, so this patch might be delayed to 4.20 and the refcount approach
> would not work with the current req cache/reuse system we have right
> now.
> If you want to finish this anyway you can work on my 9p-test branch
> (I've kept the commit there), and I'll take that patch in at the same
> time as the other.
>
>
>> Moreover, a couple of identifiers were altered to avoid name conflicts with the
>> new import.
>
> If we were to stick to this approach, two suggestions:
> - headers aren't all-or-nothing, I think it's better to only expose
> what you need (and e.g. keep the Opt_* enum in the .c)

No we don't have to stick with this patch.

> - instead of exposing trans_fd specific stuff, it's cleaner to add a
> new op vector '.flush' to the p9_trans_module struct and call that if it
> exists.
>

I didn't thought at this way, it's smarter.

>
> Thanks,
>