Re: [PATCH v2] kasan: fix deadlock in start_report()

From: Catalin Marinas
Date: Wed Mar 01 2023 - 12:00:09 EST


On Tue, Feb 28, 2023 at 10:50:46PM +0100, Andrey Konovalov wrote:
> On Tue, Feb 28, 2023 at 5:09 PM Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> > On Mon, Feb 27, 2023 at 03:13:45AM +0100, Andrey Konovalov wrote:
> > > +Catalin, would it be acceptable to implement a routine that disables
> > > in-kernel MTE tag checking (until the next
> > > mte_enable_kernel_sync/async/asymm call)? In a similar way an MTE
> > > fault does this, but without the fault itself. I.e., expose the part
> > > of do_tag_recovery functionality without report_tag_fault?
> >
> > I don't think we ever re-enable MTE after do_tag_recovery(). The
> > mte_enable_kernel_*() are called at boot. We do call
> > kasan_enable_tagging() explicitly in the kunit tests but that's a
> > controlled fault environment.
>
> Right, but here we don't want to re-enable MTE after a fault, we want
> to suppress faults when printing an error report.
>
> > IIUC, the problem is that the kernel already got an MTE fault, so at
> > that point the error is not really recoverable.
>
> No, the problem is with the following sequence of events:
>
> 1. KASAN detects a memory corruption and starts printing a report
> _without getting an MTE fault_. This happens when e.g. KASAN sees a
> free of an invalid address.
>
> 2. During error reporting, an MTE fault is triggered by the error
> reporting code. E.g. while collecting information about the accessed
> slab object.
>
> 3. KASAN tries to print another report while printing a report and
> goes into a deadlock.
>
> If we could avoid MTE faults being triggered during error reporting,
> this would solve the problem.

Ah, I get it now. So we just want to avoid triggering a benign MTE
fault.

> > If we want to avoid a
> > fault in the first place, we could do something like
> > __uaccess_enable_tco() (Vincenzo has some patches to generalise these
> > routines)
>
> Ah, this looks exactly like what we need. Adding
> __uaccess_en/disable_tco to kasan_report_invalid_free solves the
> problem.
>
> Do you think it would be possible to expose these routines to KASAN?

Yes. I'm including Vincenzo's patch below (part of fixing some potential
strscpy() faults with its unaligned accesses eager reading; we'll get to
posting that eventually). You can add some arch_kasan_enable/disable()
macros on top and feel free to include the patch below.

Now, I wonder whether we should link those into kasan_disable_current().
These functions only deal with the depth for KASAN_SW_TAGS but it would
make sense for KASAN_HW_TAGS to enable tag-check-override so that we
don't need to bother with a match-all tags on pointer dereferencing.

----8<----------------------------