Re: [syzbot] KASAN: invalid-access Read in copy_page

From: Dmitry Vyukov
Date: Tue Sep 06 2022 - 09:56:59 EST


On Tue, 6 Sept 2022 at 15:24, Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
>
> Hi Andrey,
>
> On Mon, Sep 05, 2022 at 11:39:24PM +0200, Andrey Konovalov wrote:
> > Syzbot reported an issue with MTE tagging of user pages, see the report below.
> >
> > Possibly, it's related to your "mm: kasan: Skip unpoisoning of user
> > pages" series. However, I'm not sure what the issue is.
> [...]
> > On Sat, Aug 6, 2022 at 3:31 AM syzbot
> > <syzbot+c2c79c6d6eddc5262b77@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > > BUG: KASAN: invalid-access in copy_page+0x10/0xd0 arch/arm64/lib/copy_page.S:26
> > > Read at addr f5ff000017f2e000 by task syz-executor.1/2218
> > > Pointer tag: [f5], memory tag: [f2]
> [...]
> > > The buggy address belongs to the physical page:
> > > page:000000003e6672be refcount:3 mapcount:2 mapping:0000000000000000 index:0xffffffffe pfn:0x57f2e
> > > memcg:fbff00001ded8000
> > > anon flags: 0x1ffc2800208001c(uptodate|dirty|lru|swapbacked|arch_2|node=0|zone=0|lastcpupid=0x7ff|kasantag=0xa)
>
> It looks like a copy-on-write where the source page is tagged
> (PG_mte_tagged set) but page_kasan_tag() != 0xff (kasantag == 0xa). The
> page is also swap-backed. Our current assumption is that
> page_kasan_tag_reset() should be called on page allocation and we should
> never end up with a user page without the kasan tag reset.
>
> I was hoping we can catch such condition with the diff below but it
> never triggered for me even when swapping tagged pages in and out:
>
> -------------8<-------------------------------------------
> diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
> index b2b730233274..241c616e3685 100644
> --- a/arch/arm64/kernel/mte.c
> +++ b/arch/arm64/kernel/mte.c
> @@ -62,6 +62,9 @@ void mte_sync_tags(pte_t old_pte, pte_t pte)
> if (!check_swap && !pte_is_tagged)
> return;
>
> + /* Pages mapped in user space should have had the kasan tag reset */
> + WARN_ON_ONCE(page_kasan_tag(page) != 0xff);
> +
> /* if PG_mte_tagged is set, tags have already been initialised */
> for (i = 0; i < nr_pages; i++, page++) {
> if (!test_and_set_bit(PG_mte_tagged, &page->flags))
> ------------------------8<-------------------------------
>
> Does it take long to reproduce this kasan warning? If not, it may be
> worth adding the above hunk, hopefully we can identify where that page
> is coming from before it ends up in copy_page().

syzbot finds several such cases every day (200 crashes for the past 35 days):
https://syzkaller.appspot.com/bug?extid=c2c79c6d6eddc5262b77
So once it reaches the tested tree, we should have an answer within a day.