Re: [syzbot] [net?] KASAN: slab-use-after-free Read in unix_stream_read_actor (2)

From: Shoaib Rao
Date: Mon Sep 09 2024 - 20:29:38 EST




On 9/6/2024 10:06 PM, Shoaib Rao wrote:

On 9/6/2024 9:48 AM, Shoaib Rao wrote:

On 9/6/2024 5:37 AM, Eric Dumazet wrote:
On Thu, Sep 5, 2024 at 10:48 PM Shoaib Rao <rao.shoaib@xxxxxxxxxx> wrote:

On 9/5/2024 1:35 PM, Kuniyuki Iwashima wrote:
From: Shoaib Rao <rao.shoaib@xxxxxxxxxx>
Date: Thu, 5 Sep 2024 13:15:18 -0700
On 9/5/2024 12:46 PM, Kuniyuki Iwashima wrote:
From: Shoaib Rao <rao.shoaib@xxxxxxxxxx>
Date: Thu, 5 Sep 2024 00:35:35 -0700
Hi All,

I am not able to reproduce the issue. I have run the C program at least
100 times in a loop. In the I do get an EFAULT, not sure if that is
intentional or not but no panic. Should I be doing something
differently? The kernel version I am using is
v6.11-rc6-70-gc763c4339688. Later I can try with the exact version.
The -EFAULT is the bug meaning that we were trying to read an consumed skb.

But the first bug is in recvfrom() that shouldn't be able to read OOB skb
without MSG_OOB, which doesn't clear unix_sk(sk)->oob_skb, and later
something bad happens.

     socketpair(AF_UNIX, SOCK_STREAM, 0, [3, 4]) = 0
     sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\333", iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_OOB|MSG_DONTWAIT) = 1
     recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=NULL, msg_iovlen=0, msg_controllen=0, msg_flags=MSG_OOB}, MSG_OOB| MSG_WAITFORONE) = 1
     sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\21", iov_len=1}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_OOB|MSG_NOSIGNAL|MSG_MORE) = 1
recvfrom(3, "\21", 125, MSG_DONTROUTE|MSG_TRUNC|MSG_DONTWAIT, NULL, NULL) = 1
     recvmsg(3, {msg_namelen=0}, MSG_OOB|MSG_ERRQUEUE) = -1 EFAULT (Bad address)

I posted a fix officially:
https://urldefense.com/v3/__https://lore.kernel.org/ netdev/20240905193240.17565-5-kuniyu@xxxxxxxxxx/__;!! ACWV5N9M2RV99hQ! IJeFvLdaXIRN2ABsMFVaKOEjI3oZb2kUr6ld6ZRJCPAVum4vuyyYwUP6_5ZH9mGZiJDn6vrbxBAOqYI$
Thanks that is great. Isn't EFAULT,  normally indicative of an issue
with the user provided address of the buffer, not the kernel buffer.
Normally, it's used when copy_to_user() or copy_from_user() or
something similar failed.

But this time, if you turn KASAN off, you'll see the last recvmsg()
returns 1-byte garbage instead of -EFAULT, so actually KASAN worked
on your host, I guess.
No it did not work. As soon as KASAN detected read after free it should
have paniced as it did in the report and I have been running the
syzbot's C program in a continuous loop. I would like to reproduce the
issue before we can accept the fix -- If that is alright with you. I
will try your new test case later and report back. Thanks for the patch
though.
KASAN does not panic unless you request it.

Documentation/dev-tools/kasan.rst

KASAN is affected by the generic ``panic_on_warn`` command line parameter.
When it is enabled, KASAN panics the kernel after printing a bug report.

By default, KASAN prints a bug report only for the first invalid memory access.
With ``kasan_multi_shot``, KASAN prints a report on every invalid access. This
effectively disables ``panic_on_warn`` for KASAN reports.

Alternatively, independent of ``panic_on_warn``, the ``kasan.fault=`` boot
parameter can be used to control panic and reporting behaviour:

- ``kasan.fault=report``, ``=panic``, or ``=panic_on_write`` controls whether
   to only print a KASAN report, panic the kernel, or panic the kernel on
   invalid writes only (default: ``report``). The panic happens even if
   ``kasan_multi_shot`` is enabled. Note that when using asynchronous mode of
   Hardware Tag-Based KASAN, ``kasan.fault=panic_on_write`` always panics on
   asynchronously checked accesses (including reads).

Hi Eric,

Thanks for the update. I forgot to mention that I I did set /proc/sys/ kernel/panic_on_warn to 1. I ran the program over night in two separate windows, there are no reports and no panic. I first try to reproduce the issue, because if I can not, how can I be sure that I have fixed that bug? I may find another issue and fix it but not the one that I was trying to. Please be assured that I am not done, I continue to investigate the issue.

If someone has a way of reproducing the failure please kindly let me know.

Kind regards,

Shoaib

I have tried reproducing using the newly added tests but no luck. I will keep trying but if there is another occurrence please let me know. I am using an AMD system but that should not have any impact.

Shoaib


I have some more time investigating the issue. The sequence of packet arrival and consumption definitely points to an issue with OOB handling and I will be submitting a patch for that.

kasan does not report any issue because there are none. While the handling is incorrect, at no point freed memory is accessed. EFAULT error code is returned from __skb_datagram_iter()

/* This is not really a user copy fault, but rather someone
 * gave us a bogus length on the skb.  We should probably
 * print a warning here as it may indicate a kernel bug.
 */

fault:
    iov_iter_revert(to, offset - start_off);
    return -EFAULT;

As the comment says, the issue is that the skb in question has a bogus length. Due to the bug in handling, the OOB byte has already been read as a regular byte, but oob pointer is not cleared, So when a read with OOB flag is issued, the code calls __skb_datagram_iter with the skb pointer which has a length of zero. The code detects it and returns the error. Any doubts can be verified by checking the refcnt on the skb.

My conclusion is that the bug report by syzbot is not caused by the mishandling of OOB, unless there was code added to disregard the skb length and read a byte.

The error being returned is confusing. The callers should not pass this error to the application. They should process the error.

Shoaib