Re: [PATCH] 9p: fix Use-After-Free in p9_write_work()

From: Dominique Martinet
Date: Sun Jul 29 2018 - 19:34:02 EST


Tomas Bortoli wrote on Sun, Jul 29, 2018:
> There is a race condition between p9_free_req() and p9_write_work().
> A request might still need to be processed while p9_free_req() is called.
>
> To fix it, flush the read/write work before freeing any request.
>
> Signed-off-by: Tomas Bortoli <tomasbortoli@xxxxxxxxx>
> Reported-by: syzbot+467050c1ce275af2a5b8@xxxxxxxxxxxxxxxxxxxxxxxxx

It looks like I have not received this report, I found it through google
in the lkml archives but Dmitry do you have a convenient-ish way of
finding the report on the syzkaller website with that reported-by tag?

> ---
>
> To be able to flush the r/w work from client.c we need the p9_conn and
> p9_trans_fd definitions. Therefore this commit moves most of the declarations in
> trans_fd.c to trans_fd.h and import such file in client.c

This cannot work as it is, because you're not just intorudcing the
trans_fd types but you're really depending on the transport used being
fd.
'conn.wq' won't even be valid memory in other transports so I don't want
to know what trying to flush_work on this will do... :)


Other transports also have the same issue see discussion in
https://lkml.org/lkml/2018/7/19/727
(that is another syzbot report, slightly different but I believe it
points to the same issue)

Basically, a more global view of the problem is a race between
p9_tag_lookup returning a p9_req_t and another thread freeing it.

Matthew wrote the problem himself in a comment in p9_tag_lookup in his new
version that used to be in linux-next at the time (I took the commit out
temporarily until I've had time to benchmark it, but it will come back in,
just you're working on thin air right now because the bug was only found
thanks to this commit):
+ /* There's no refcount on the req; a malicious server could
cause
+ * us to dereference a NULL pointer
+ */

So a more proper solution would be to had a refcount to req, have
p9_tag_lookup increment the refcount within rcu_read_lock, and have a
deref function free the req when the count hits 0.


Now we're here though I'm not sure what to suggest, I had promised to
get some performance benchmark out by this past weekend and I obviously
failed, so this patch might be delayed to 4.20 and the refcount approach
would not work with the current req cache/reuse system we have right
now.
If you want to finish this anyway you can work on my 9p-test branch
(I've kept the commit there), and I'll take that patch in at the same
time as the other.


> Moreover, a couple of identifiers were altered to avoid name conflicts with the
> new import.

If we were to stick to this approach, two suggestions:
- headers aren't all-or-nothing, I think it's better to only expose
what you need (and e.g. keep the Opt_* enum in the .c)
- instead of exposing trans_fd specific stuff, it's cleaner to add a
new op vector '.flush' to the p9_trans_module struct and call that if it
exists.


Thanks,
--
Dominique