RE: [PATCH v2] kasan: fix deadlock in start_report()

From: 袁帅(Shuai Yuan)
Date: Fri Mar 03 2023 - 05:42:11 EST


> On Tue, Feb 28, 2023 at 10:50:46PM +0100, Andrey Konovalov wrote:
> > On Tue, Feb 28, 2023 at 5:09 PM Catalin Marinas <catalin.marinas@xxxxxxx>
> >
> > Right, but here we don't want to re-enable MTE after a fault, we want
> > to suppress faults when printing an error report.
> >
> > > IIUC, the problem is that the kernel already got an MTE fault, so at
> > > that point the error is not really recoverable.
> >
> > No, the problem is with the following sequence of events:
> >
> > 1. KASAN detects a memory corruption and starts printing a report
> > _without getting an MTE fault_. This happens when e.g. KASAN sees a
> > free of an invalid address.
> >
> > 2. During error reporting, an MTE fault is triggered by the error
> > reporting code. E.g. while collecting information about the accessed
> > slab object.
> >
> > 3. KASAN tries to print another report while printing a report and
> > goes into a deadlock.
> >
> > If we could avoid MTE faults being triggered during error reporting,
> > this would solve the problem.
>
> Ah, I get it now. So we just want to avoid triggering a benign MTE fault.
>
> > > If we want to avoid a
> > > fault in the first place, we could do something like
> > > __uaccess_enable_tco() (Vincenzo has some patches to generalise
> > > these
> > > routines)
> >
> > Ah, this looks exactly like what we need. Adding
> > __uaccess_en/disable_tco to kasan_report_invalid_free solves the
> > problem.
> >
> > Do you think it would be possible to expose these routines to KASAN?
>
> Yes. I'm including Vincenzo's patch below (part of fixing some potential
> strscpy() faults with its unaligned accesses eager reading; we'll get to posting
> that eventually). You can add some arch_kasan_enable/disable() macros on
> top and feel free to include the patch below.

I have initially verified the following code on kernel version 5.15 and it is valid.
Although not using the latest interface, there is no fundamental difference.
I think this change should also apply to the latest kernel code.

diff --git a/mm/kasan/report.c b/mm/kasan/report.c
index 1f96a72c7edd..73b7fc532d81 100644
--- a/mm/kasan/report.c
+++ b/mm/kasan/report.c
@@ -28,7 +28,9 @@
#include <trace/events/error_report.h>

#include <asm/sections.h>
-
+#ifdef CONFIG_KASAN_HW_TAGS
+#include <asm/uaccess.h>
+#endif
#include <kunit/test.h>

#include "kasan.h"
@@ -107,6 +109,10 @@ static void start_report(unsigned long *flags)
*/
kasan_disable_current();
spin_lock_irqsave(&report_lock, *flags);
+#ifdef CONFIG_KASAN_HW_TAGS
+if (kasan_hw_tags_enabled())
+__uaccess_enable_tco();
+#endif
pr_err("==================================================================\n");
}

@@ -116,6 +122,10 @@ static void end_report(unsigned long *flags, unsigned long addr)
trace_error_report_end(ERROR_DETECTOR_KASAN, addr);
pr_err("==================================================================\n");
add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
+#ifdef CONFIG_KASAN_HW_TAGS
+if (kasan_hw_tags_enabled())
+__uaccess_disable_tco();
+#endif
spin_unlock_irqrestore(&report_lock, *flags);
if (panic_on_warn && !test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags)) {
/*

> Now, I wonder whether we should link those into kasan_disable_current().
> These functions only deal with the depth for KASAN_SW_TAGS but it would
> make sense for KASAN_HW_TAGS to enable tag-check-override so that we
> don't need to bother with a match-all tags on pointer dereferencing.

ZEKU
信息安全声明:本邮件包含信息归发件人所在组织ZEKU所有。 禁止任何人在未经授权的情况下以任何形式(包括但不限于全部或部分披露、复制或传播)使用包含的信息。若您错收了本邮件,请立即电话或邮件通知发件人,并删除本邮件及附件。
Information Security Notice: The information contained in this mail is solely property of the sender's organization ZEKU. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it.