Re: [syzbot] [net?] KASAN: slab-use-after-free Read in unix_stream_read_actor (2)

From: Shoaib Rao
Date: Tue Sep 10 2024 - 18:30:38 EST


My fellow engineer let's first take a breath and calm down. We both are trying to do the right thing. Now read my comments below and if I still don't get it, please be patient, maybe I am not as smart as you are.

On 9/10/2024 2:53 PM, Kuniyuki Iwashima wrote:
From: Shoaib Rao <rao.shoaib@xxxxxxxxxx>
Date: Tue, 10 Sep 2024 13:57:04 -0700
The commit Message:

syzbot reported use-after-free in unix_stream_recv_urg(). [0]

The scenario is

1. send(MSG_OOB)
2. recv(MSG_OOB)
-> The consumed OOB remains in recv queue
3. send(MSG_OOB)
4. recv()
-> manage_oob() returns the next skb of the consumed OOB
-> This is also OOB, but unix_sk(sk)->oob_skb is not cleared
5. recv(MSG_OOB)
-> unix_sk(sk)->oob_skb is used but already freed

How did you miss this ?

Again, please read my patch and mails **carefully**.

unix_sk(sk)->oob_sk wasn't cleared properly and illegal access happens
in unix_stream_recv_urg(), where ->oob_skb is dereferenced.

Here's _technical_ thing that you want.

This is exactly what I am trying to point out to you.
The skb has proper references and is NOT de-referenced because __skb_datagram_iter() detects that the length is zero and returns EFAULT.

See more below

---8<---
# ./oob
executing program
[ 25.773750] queued OOB: ff1100000947ba40
[ 25.774110] reading OOB: ff1100000947ba40
[ 25.774401] queued OOB: ff1100000947bb80
[ 25.774669] free skb: ff1100000947bb80
[ 25.774919] reading OOB: ff1100000947bb80
[ 25.775172] ==================================================================
[ 25.775654] BUG: KASAN: slab-use-after-free in unix_stream_read_actor+0x86/0xb0
---8<---

---8<---
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index a1894019ebd5..ccd9c47160a5 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2230,6 +2230,7 @@ static int queue_oob(struct socket *sock, struct msghdr *msg, struct sock *other
__skb_queue_tail(&other->sk_receive_queue, skb);
spin_unlock(&other->sk_receive_queue.lock);
+ printk(KERN_ERR "queued OOB: %px\n", skb);
sk_send_sigurg(other);
unix_state_unlock(other);
other->sk_data_ready(other);
@@ -2637,6 +2638,7 @@ static int unix_stream_recv_urg(struct unix_stream_read_state *state)
spin_unlock(&sk->sk_receive_queue.lock);
unix_state_unlock(sk);
+ printk(KERN_ERR "reading OOB: %px\n", oob_skb);
chunk = state->recv_actor(oob_skb, 0, chunk, state);
if (!(state->flags & MSG_PEEK))
@@ -2915,7 +2917,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
skb_unlink(skb, &sk->sk_receive_queue);
consume_skb(skb);
-
+ printk(KERN_ERR "free skb: %px\n", skb);

This printk is wrongly placed because the skb has been freed above, but since it is just printing the pointer it should be OK, access to any skb field will be an issue. You should move this printk to before unix_stream_read_generic and print the refcnt on the skb and the length of the data and verify what I am saying, that the skb has one refcnt and zero length.

This is the kind of interaction I was looking for. If I have missed something please be patient and let me know.

Regards,

Shoaib


if (scm.fp)
break;
} else {
---8<---

[...]
None of this points to where the skb is "dereferenced" The test case
added will never panic the kernel.

In the organizations that I have worked with, just because a change
stops a panic does not mean the issue is fixed. You have to explicitly
show where and how. That is what i am asking, Yes there is a bug, but it
will not cause the panic, so if you are just high and might engiineer,
show where and how?

This will be the last mail from me in this thread. I don't want to
waste time on someone who doesn't read mails.
Yes please don't if you do not have anything technical to say, all your
comments are "smart comments". This email thread would end if you could
just say, here is line XXXX where the skb is de referenced, but you have
not because you have no idea.

BTTW Your comment about holding the extra refcnt without any reason just
shows that you DO NOT understand the code. And the confusion to refcnts
has been caused by your changes, I am concerned your changes have broken
the code.

I am willing to accept that I may be wrong but only if I am shown the
location of de-reference. Please do not respond if you can not point to
the exact location.

Please do not respond if you just ask without thinking.