Re: [PATCH v2] mm/alloc_tag: fix vm_module_tags_populate's KASAN poisoning logic

From: Hao Ge
Date: Wed Dec 04 2024 - 22:21:09 EST


Hi Suren


On 12/5/24 10:14, Hao Ge wrote:
Hi Suren


On 12/5/24 03:33, Suren Baghdasaryan wrote:
On Wed, Dec 4, 2024 at 7:08 AM Hao Ge <hao.ge@xxxxxxxxx> wrote:
Hi Suren


Thank you for your review.


On 12/4/24 22:39, Suren Baghdasaryan wrote:
On Wed, Dec 4, 2024 at 12:35 AM Hao Ge <hao.ge@xxxxxxxxx> wrote:
From: Hao Ge <gehao@xxxxxxxxxx>

After merge commit 233e89322cbe ("alloc_tag:
fix module allocation tags populated area calculation"),
We still encountered a KASAN bug.

This is because we have only actually performed
page allocation and address mapping here.
we need to unpoisoned portions of underlying memory.

Because we have a change in the size here,we need to
re-annotate poisoned and unpoisoned portions of underlying memory
according to the new size.

Here is the log for KASAN:

[    5.041171][    T1] ==================================================================
[    5.042047][    T1] BUG: KASAN: vmalloc-out-of-bounds in move_module+0x2c0/0x708
[    5.042723][    T1] Write of size 240 at addr ffff80007e510000 by task systemd/1
[    5.043412][    T1]
[    5.043523][   T72] input: QEMU QEMU USB Tablet as /devices/pci0000:00/0000:00:01.1/0000:02:001
[    5.043614][    T1] CPU: 0 UID: 0 PID: 1 Comm: systemd Not tainted 6.13.0-rc1+ #28
[    5.045560][    T1] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[    5.046328][    T1] Call trace:
[    5.046670][    T1]  show_stack+0x20/0x38 (C)
[    5.047127][    T1]  dump_stack_lvl+0x80/0xf8
[    5.047533][    T1] print_address_description.constprop.0+0x58/0x358
[    5.048092][   T72] hid-generic 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Mouse [QEMU 0
[    5.048126][    T1]  print_report+0xb0/0x280
[    5.049682][    T1]  kasan_report+0xb8/0x108
[    5.050170][    T1]  kasan_check_range+0xe8/0x190
[    5.050685][    T1]  memcpy+0x58/0xa0
[    5.051135][    T1]  move_module+0x2c0/0x708
[    5.051586][    T1] layout_and_allocate.constprop.0+0x308/0x5b8
[    5.052219][    T1]  load_module+0x134/0x16c8
[    5.052671][    T1]  init_module_from_file+0xdc/0x138
[    5.053193][    T1]  idempotent_init_module+0x344/0x600
[    5.053742][    T1] __arm64_sys_finit_module+0xbc/0x150
[    5.054289][    T1]  invoke_syscall+0xd4/0x258
[    5.054749][    T1] el0_svc_common.constprop.0+0xb4/0x240
[    5.055319][    T1]  do_el0_svc+0x48/0x68
[    5.055743][    T1]  el0_svc+0x40/0xe0
[    5.056142][    T1]  el0t_64_sync_handler+0x10c/0x138
[    5.056658][    T1]  el0t_64_sync+0x1ac/0x1b0

Fixes: 233e89322cbe ("alloc_tag: fix module allocation tags populated area calculation")
Signed-off-by: Hao Ge <gehao@xxxxxxxxxx>
Thanks for the fix!

---
v2: Add comments to kasan_unpoison_vmalloc like other places.

commit 233e89322cbe ("alloc_tag: fix module allocation
tags populated area calculation") is currently in the
mm-hotfixes-unstable branch, so this patch is
developed based on the mm-hotfixes-unstable branch.
---
   lib/alloc_tag.c | 13 +++++++++++++
   1 file changed, 13 insertions(+)

diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c
index 4ee6caa6d2da..f885b3f3af0e 100644
--- a/lib/alloc_tag.c
+++ b/lib/alloc_tag.c
@@ -421,7 +421,20 @@ static int vm_module_tags_populate(void)
__free_page(next_page[i]);
                          return -ENOMEM;
                  }
+
+               kasan_poison_vmalloc((void *)module_tags.start_addr,
+ vm_module_tags->nr_pages << PAGE_SHIFT);
+
                  vm_module_tags->nr_pages += nr;
+
+               /*
+                * Mark the pages as accessible, now that they are mapped.
+                * With hardware tag-based KASAN, marking is skipped for
+                * non-VM_ALLOC mappings, see __kasan_unpoison_vmalloc().
+                */
+               kasan_unpoison_vmalloc((void *)module_tags.start_addr,
+ vm_module_tags->nr_pages << PAGE_SHIFT,
+ KASAN_VMALLOC_PROT_NORMAL);
Instead of poisoning [module_tags.start_addr,
vm_module_tags->nr_pages], incrementing vm_module_tags->nr_pages and
the unpoisoning [module_tags.start_addr, vm_module_tags->nr_pages]
could we simply poisons the additional area like this:

                  kasan_unpoison_vmalloc((void *)module_tags.start_addr +
(vm_module_tags->nr_pages << PAGE_SHIFT),
                                         nr << PAGE_SHIFT,
KASAN_VMALLOC_PROT_NORMAL);
                 vm_module_tags->nr_pages += nr;
?

I had considered making such modifications earlier.

But considering the following situation,

A module tags spans across the regions of [module_tags.start_addr,
vm_module_tags->nr_pages] and [module_tags.start_addr +
vm_module_tags->nr_pages, ...].

It may result in false positives for out-of-bounds errors.
Sorry, maybe I'm missing something but I don't see why poisoning only
newly mapped area would lead to false positives. Could you please
clarify?


Because KASAN may perceive the two as distinct address spaces, despite their addresses being contiguous.

So, when a module tag spans across these two contiguous address spaces, KASAN may incorrectly consider it as an out-of-bounds access.


Also, if you do need to unpoison and then poison, using phys_end and
new_end would be better, like this:

kasan_poison_vmalloc((void *)module_tags.start_addr,
                                       phys_end - module_tags.start_addr)

kasan_unpoison_vmalloc((void *)module_tags.start_addr,
                                           new_end - module_tags.start_addr,
KASAN_VMALLOC_PROT_NORMAL);

OK, the next version will include.

After verification and consideration, I have found that this modification may still pose problems.

Because we haven't ensured that  new_end is page-aligned,

So, we've only made the region from||module_tags.start_addr tonew_endaccessible.

Using this example, in reality,end equals 0xffff80007e5100f0:

Write of size 240 at addr ffff80007e510000 by task systemd/1

When we access other memory within the same page as0xffff80007e5100f0, KASAN warnings will also be issued due to the lack of unpoisoned portions in that memory.

Based on that, I would suggest sticking with the V2 version.


Thanks

Best Regards

Hao



Thanks

Best regards

Hao



          }

          return 0;
--
2.25.1