Re: [syzbot] [net?] KASAN: slab-use-after-free Read in unix_stream_read_actor (2)

From: Shoaib Rao
Date: Tue Sep 10 2024 - 16:57:42 EST




On 9/10/2024 12:49 PM, Kuniyuki Iwashima wrote:
From: Shoaib Rao <rao.shoaib@xxxxxxxxxx>
Date: Tue, 10 Sep 2024 11:49:20 -0700
On 9/10/2024 11:33 AM, Kuniyuki Iwashima wrote:
From: Shoaib Rao <rao.shoaib@xxxxxxxxxx>
Date: Tue, 10 Sep 2024 11:16:59 -0700
On 9/10/2024 10:57 AM, Kuniyuki Iwashima wrote:
From: Shoaib Rao <rao.shoaib@xxxxxxxxxx>
Date: Tue, 10 Sep 2024 09:55:03 -0700
On 9/9/2024 5:48 PM, Kuniyuki Iwashima wrote:
From: Shoaib Rao <rao.shoaib@xxxxxxxxxx>
Date: Mon, 9 Sep 2024 17:29:04 -0700
I have some more time investigating the issue. The sequence of packet
arrival and consumption definitely points to an issue with OOB handling
and I will be submitting a patch for that.

It seems a bit late.
My patches were applied few minutes before this mail was sent.
https://urldefense.com/v3/__https://lore.kernel.org/netdev/172592764315.3964840.16480083161244716649.git-patchwork-notify@xxxxxxxxxx/__;!!ACWV5N9M2RV99hQ!M806VrqNEGFgGXEoWG85msKAdFPXup7RzHy9Kt4q_HOfpPWsjNHn75KyFK3a3jWvOb9EEQuFGOjpqgk$


That is a subpar fix. I am not sure why the maintainers accepted the fix
when it was clear that I was still looking into the issue.

Just because it's not a subpar fix and you were slow and wrong,
clining to triggering the KASAN splat without thinking much.


Plus the
claim that it fixes the panic is absolutely wrong.

The _root_ cause of the splat is mishandling of OOB in manage_oob()
which causes UAF later in another recvmsg().

Honestly your patch is rather a subpar fix to me, few points:

1. The change conflicts with net-next as we have already removed
the additional unnecessary refcnt for OOB skb that has caused
so many issue reported by syzkaller

2. Removing OOB skb in queue_oob() relies on the unneeded refcnt
but it's not mentioned; if merge was done wrongly, another UAF
will be introduced in recvmsg()

3. Even the removing logic is completely unnecessary if manage_oob()
is changed

4. The scan_again: label is misplaced; two consecutive empty OOB skbs
never exist at the head of recvq

5. ioctl() is not fixed

6. No test added

7. Fixes: tag is bogus

8. Subject lacks target tree and af_unix prefix

If you want to nit pick, nit pick away, Just because the patch email
lacks proper formatting does not make the patch technically inferior.

Ironically you just nit picked 8.



I have no idea what you mean. I am more worried about technical
correctness than formatting -- That does not mean formatting is not
necessary.

I started pointing out technical stuff and ended with nit-pick because
"I am more worried about technical correctness", but you started nit
picking from the last point. That's unfortunate.



My
fix is a proper fix not a hack. The change in queue_oob is sufficient to
fix all issues including SIOCATMARK. The fix in manage_oob is just for
correctness.

Then, it should be WARN_ON_ONCE() not to confuse future readers.


In your fix I specifically did not like the change made to
fix SIOCATMARK.

I don't like that part too, but it's needed to avoid the additional refcnt
that is much worse as syzbot has been demonstrating.


syzbot has nothing to do with doing a proper fix.

You don't understand my point. syzbot has been finding many real issues
that were caused by poor handling of the additional refcount.

Also, removing it discovered another bug in manage_oob(). That's a enough
reason to explain why we should remove the unnecessary refcnt.


One has to understand
the code though to do the fix at the proper location.

I'm not saying that the patch is correct if it silences syzbot.

Actually, I said KASAN is handy but you need not rely on it.

Rather it's you who argued the splat was needed even without trying
to understand the code.

I really don't understand why you are saying this to me now.





What is most worrying is claim to fixing a panic when it can not even
happen with the bug.

It's only on your setup. syzbot and I were able to trigger that with
the bug.


Really, what is so special about my setup that kasan does not like? Can
you point me to the exact location where the access is made?

I don't know, it's your job.


I am at least glad that you have backed off your assertion that my
change does not fix the ioctl.

Okay, I was wrong about that, and what about other points, fragile
refcnt, non-WARN_ON_ONCE(), misplaced label, no test, bogus tag ?


I am sure if I keep pressing you, you
will back off the panic claim as well.

I also don't understand what you are saying and why you still can't
correlate the splat and the sequences of syscalls in the repro.


You yourself admitted you did not
know why kasan was not panicing, Has anyone else hit the same panic?

If you can pin point the exact location where the illegal access is
made, please do so and I will accept that I am wrong, other than that I
am not interested in this constant back and forth with no technical
details just fluff.

Please read my changelog (and mails) carefully that pin-point the
exact location and reason where/why the illegal access happens.

This is the explanation in the email:

>
> No it did not work. As soon as KASAN detected read after free it should
> have paniced as it did in the report and I have been running the
> syzbot's C program in a continuous loop. I would like to reproduce the
> issue before we can accept the fix -- If that is alright with you. I
> will try your new test case later and report back. Thanks for the patch
> though.

The commit Message:

syzbot reported use-after-free in unix_stream_recv_urg(). [0]

The scenario is

1. send(MSG_OOB)
2. recv(MSG_OOB)
-> The consumed OOB remains in recv queue
3. send(MSG_OOB)
4. recv()
-> manage_oob() returns the next skb of the consumed OOB
-> This is also OOB, but unix_sk(sk)->oob_skb is not cleared
5. recv(MSG_OOB)
-> unix_sk(sk)->oob_skb is used but already freed

The recent commit 8594d9b85c07 ("af_unix: Don't call skb_get() for OOB
skb.") uncovered the issue.

If the OOB skb is consumed and the next skb is peeked in manage_oob(),
we still need to check if the skb is OOB.

Let's do so by falling back to the following checks in manage_oob()
and add the test case in selftest.

Note that we need to add a similar check for SIOCATMARK.

And this

* [PATCH v1 net-next] af_unix: Don't call skb_get() for OOB skb.
@ 2024-08-16 23:39 Kuniyuki Iwashima
2024-08-20 23:00 ` patchwork-bot+netdevbpf
0 siblings, 1 reply; 2+ messages in thread
From: Kuniyuki Iwashima @ 2024-08-16 23:39 UTC (permalink / raw)
To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

Since introduced, OOB skb holds an additional reference count with no
special reason and caused many issues.

Also, kfree_skb() and consume_skb() are used to decrement the count,
which is confusing.

Let's drop the unnecessary skb_get() in queue_oob() and corresponding
kfree_skb(), consume_skb(), and skb_unref().

Now unix_sk(sk)->oob_skb is just a pointer to skb in the receive queue,
so special handing is no longer needed in GC.


And more:

The splat is handy, you may want to check the returned value of recvfrom()
with KASAN on and off and then focus on the root cause.  When UAF happens,
the real bug always happens before that.

FWIW, I was able to see the splat on my setup and it disappeared with
my patch.  Also, syzbot has already tested the equivalent change.
https://urldefense.com/v3/__https://lore.kernel.org/netdev/00000000000064fbcb06215a7bbc@xxxxxxxxxx/__;!!ACWV5N9M2RV99hQ!MiZQCdyNumVmB3YeKfZxVbJFPsouXDCN7ie5DBfjBiurJib1D7k4bO_Jg1cAie9KIkAgXezFUaGOFhg$

None of this points to where the skb is "dereferenced" The test case added will never panic the kernel.

In the organizations that I have worked with, just because a change stops a panic does not mean the issue is fixed. You have to explicitly show where and how. That is what i am asking, Yes there is a bug, but it will not cause the panic, so if you are just high and might engiineer, show where and how?

This will be the last mail from me in this thread. I don't want to
waste time on someone who doesn't read mails.
Yes please don't if you do not have anything technical to say, all your comments are "smart comments". This email thread would end if you could just say, here is line XXXX where the skb is de referenced, but you have not because you have no idea.

BTTW Your comment about holding the extra refcnt without any reason just shows that you DO NOT understand the code. And the confusion to refcnts has been caused by your changes, I am concerned your changes have broken the code.

I am willing to accept that I may be wrong but only if I am shown the location of de-reference. Please do not respond if you can not point to the exact location.

Shoaib