Re: [PATCH] rxrpc: struct mutex cannot be used for rxrpc_call::user_mutex

From: Peter Zijlstra
Date: Wed Dec 18 2019 - 08:51:06 EST


On Tue, Dec 17, 2019 at 03:32:00PM +0000, David Howells wrote:
> Standard kernel mutexes cannot be used in any way from interrupt or softirq
> context, so the user_mutex which manages access to a call cannot be a mutex
> since on a new call the mutex must start off locked and be unlocked within
> the softirq handler to prevent userspace interfering with a call we're
> setting up.
>
> Commit a0855d24fc22d49cdc25664fb224caee16998683 ("locking/mutex: Complain
> upon mutex API misuse in IRQ contexts") causes big warnings to be splashed
> in dmesg for each a new call that comes in from the server.

FYI that patch has currently been reverted.

commit c571b72e2b845ca0519670cb7c4b5fe5f56498a5 (tip/locking/urgent, tip/locking-urgent-for-linus)

> Whilst it *seems* like it should be okay, since the accept path
> trylocks the mutex when no one else can see it and drops the mutex
> before it leaves softirq context, unlocking the mutex causes scheduler
> magic to happen.

Not quite. The problem is that trylock still sets the owner task of the
mutex, and a contending mutex_lock() could end up PI boosting that
owner.

The problem happens when that owner is the idle task, this can happen
when the irq/softirq hits the idle task, in that case the contending
mutex_lock() will try and PI boost the idle task, and that is a big
no-no.

> Fix this by switching to using a locking bit and a waitqueue instead.

So the problem with this approach is that it will create priority
inversions between the sites that had mutex_lock().

Suppose some FIFO-99 task does rxrpc_user_lock_call() and gets to block
because some FIFO-1 task has it. Then an unrelated FIFO-50 task comes
and preempts our FIFO-1 owner, now the FIFO-99 task has unbounded
response time (classic priority inversion).

This is what the PI magic is supposed to fix, it would boost the FIFO-1
owner to FIFO-99 for the duration of the lock, which avoids the FIFO-50
task from being elegible to run and ruin the day.

Anyway, regarding your question on IRC, the lockdep bits look OK.

But looking at this code, reminded me of our earlier discussion, where
you said:

"I could actually mvoe the rxrpc_send_ping() and the mutex_unlock()
into rxrpc_new_incoming_call() which would make it clearer to see as
the lock and unlock would then be in the same function."

Which I think still has merrit; it would make reading this code a whole
lot easier.

Back to the actual problem; how come the rxrpc_call thing can be
accessed before it is 'complete'? Is there something we can possibly do
there?

The comment with the mutex_trylock() in rxrpc_new_incoming_call() has:

/* Lock the call to prevent rxrpc_kernel_send/recv_data() and
* sendmsg()/recvmsg() inconveniently stealing the mutex once the
* notification is generated.
*
* The BUG should never happen because the kernel should be well
* behaved enough not to access the call before the first notification
* event and userspace is prevented from doing so until the state is
* appropriate.
*/

Why do we have to send this notification so early? Is there anything
else we can do avoid these other users from wanting to prod at our
object for a little while? I'm still properly lost in this code.


---
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 135bf5cd8dd5..3fec99cc2e4d 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -435,6 +435,10 @@ struct rxrpc_call *rxrpc_new_incoming_call(struct rxrpc_local *local,
_leave(" = %p{%d}", call, call->debug_id);
out:
spin_unlock(&rx->incoming_lock);
+ if (call) {
+ rxrpc_send_ping(call, skb);
+ mutex_unlock(&call->user_mutex);
+ }
return call;
}

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 157be1ff8697..665a0532a221 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -1396,8 +1396,6 @@ int rxrpc_input_packet(struct sock *udp_sk, struct sk_buff *skb)
call = rxrpc_new_incoming_call(local, rx, skb);
if (!call)
goto reject_packet;
- rxrpc_send_ping(call, skb);
- mutex_unlock(&call->user_mutex);
}

/* Process a call packet; this either discards or passes on the ref