Re: Bug 4.1.16: self-detected stall in net/unix/?

From: Rainer Weikusat
Date: Thu Feb 11 2016 - 12:41:18 EST


Ben Hutchings <ben@xxxxxxxxxxxxxxx> writes:
> On Thu, 2016-02-11 at 15:55 +0000, Rainer Weikusat wrote:
>> Philipp Hahn <pmhahn@xxxxxxxxx> writes:
>>
>> [...]
>>
>> > Probably the same bug was also reported to samba-technical by Karolin
>> > Seeger; she filed the bug for 3.19-ckt with Ubuntu:
>> >
>> >
>> >
>> > Running the Samba test suite reproduces the problem; see bug for
>> > details.
>>
>>
>> JFTR: The oops in this bug report is for 3.13.0-77 and the patch you
>> reverted for 4.1 is not part of that (at least not of the upstream 3.13).
> [...]
>
> It is in 3.13-ckt and basically all the stable branches.
>
> Does the patch below fix this bug?
>
> Ben.
>
> ---
> unix: Fix potential double-unlock in unix_dgram_sendmsg()
>
> A datagram socket may be peered with itself, so that sk == other.ÂÂWe
> use unix_state_double_lock() to lock sk and other in the right order,
> which also guards against this and only locks the socket once, but we
> then end up trying to unlock it twice.ÂÂAdd the check for sk != other.

That's a good observation but I think this happens in another way. The
code setting sk_logged is

if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
if (timeo) {
timeo = unix_wait_for_peer(other, timeo);

err = sock_intr_errno(timeo);
if (signal_pending(current))
goto out_free;

goto restart;
}

if (!sk_locked) {
unix_state_unlock(other);
unix_state_double_lock(sk, other);
}

if (unix_peer(sk) != other ||
unix_dgram_peer_wake_me(sk, other)) {
err = -EAGAIN;
sk_locked = 1;
goto out_unlock;
}

if (!sk_locked) {
sk_locked = 1;
goto restart_locked;
}
}

This means it only gets locked if unix_peer(other) != sk and this cannot
happen if other == sk and unix_peer(sk) == other, however, the 2nd
condition isn't guaranteed: other might indeed be == sk and not the peer
of it because someone could be using _sendmsg to send a message via a
socket to an address bound to the same socket. In this case, other was
found via

if (!other) {
err = -ECONNRESET;
if (sunaddr == NULL)
goto out_free;

other = unix_find_other(net, sunaddr, namelen, sk->sk_type,
hash, &err);
if (other == NULL)
goto out_free;
}

and the if-block leading to the double lock should never have been
executed as it's supposed to deal with the case where sk is connect to
other but other not to sk (eg, /dev/log).

If the description correct, the patch below should fix it (as sunaddr
gets cleared if and only if unix_peer(sk) == other.

This is a 'preview', ie, not even compiled. It's provided for
discussion/ testing. I'll try to create a test program for this and a
more formal patch.

---

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 1975fd8..06259ac 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1698,7 +1698,7 @@ restart_locked:
goto out_unlock;
}

- if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
+ if (!sunaddr && unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
if (timeo) {
timeo = unix_wait_for_peer(other, timeo);