Re: [PATCH] net/9p: Fix uaf / refcnt underflow for req object in virtio

From: asmadeus
Date: Mon Jul 22 2024 - 05:06:58 EST


Christian Schoenebeck wrote on Mon, Jul 22, 2024 at 10:03:11AM +0200:
>> diff --git a/net/9p/client.c b/net/9p/client.c
>> index 5cd94721d974..51ce3bd4aece 100644
>> --- a/net/9p/client.c
>> +++ b/net/9p/client.c
>> @@ -616,9 +616,10 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
>> if (READ_ONCE(oldreq->status) == REQ_STATUS_SENT) {
>> if (c->trans_mod->cancelled)
>> c->trans_mod->cancelled(c, oldreq);
>> + /* reply won't come anymore, drop the "receive" ref */
>> + p9_req_put(client, req);

That was meant to be oldreq, not req (factoring out the dropping ref
part out of the callback)

>> }
>>
>> - p9_req_put(c, req);

That one is a different req that needs to be dropped alright; this
should stay.

(note to self: don't try to think about 9p refcounting during lunch
break)

>> return 0;
>> }
>
> [...]
>
> So a Tflush request by client is immediately answered by a Rflush response and
> in this case no answer is sent to the original request being flushed.
>
> There are also QEMU test cases guarding the expected Tflush behaviour:
> https://github.com/qemu/qemu/blob/a7ddb48b/tests/qtest/virtio-9p-test.c#L403
> and
> https://github.com/qemu/qemu/blob/a7ddb48b/tests/qtest/virtio-9p-test.c#L444
>
> The 2nd test case handles the behaviour when the Tflush request arrived too
> late, after the original request already completed successfully that is. So in
> this case client first receives a success response to the original request,
> then followed by Rflush response.

Oh, I was convinced the flush was just queued after the original
response to follow the spec, but that there was no mechanism to cancel
the IO in flight so the 'cancelled' path was never exerced.

If you're doing cancels (not sending the original response if the flush
timing was early enough), then the third drop actually probably was that
callback:
- one for the error code in p9_virtio_zc_request
- one for cancelled
- one at the end of p9_client_zc_rpc

I agree p9_virtio_zc_request shouldn't drop the ref if we're going to
cancel it, so dropping that put makes sense.
We should actually also just drop that put altogether and make the code
closer to what we do in p9_client_rpc (non-zc version) where we put the
ref on error immediately after ->request(), because that code knows
about the ERESTARTSYS logic, and it's not something each transport
should have to worry about.

(At which point we can also drop the ref in ERESTARTSYS for the flush
request itself then as pointed out..)

So I'll update the patch to drop p9_req_put() altogether from
p9_virtio_zc_request and call it in p9_client_zc_rpc() more
appropriately, and send the cancelled thing as follow up instead.

Hopefully didn't get it wrong too much this time, tests will tell...
--
Dominique Martinet | Asmadeus