Re: [PATCH] vmap(): don't allow invalid pages
From: Yury Norov
Date: Wed Jan 19 2022 - 12:23:19 EST
On Tue, Jan 18, 2022 at 10:17 PM Anshuman Khandual
<anshuman.khandual@xxxxxxx> wrote:
>
>
>
> On 1/19/22 6:21 AM, Matthew Wilcox wrote:
> > On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote:
> >> vmap() takes struct page *pages as one of arguments, and user may provide
> >> an invalid pointer which would lead to DABT at address translation later.
> >
> > Could we spell out 'DABT'? Presumably that's an ARM-specific thing.
> > Just like we don't say #PF for Intel page faults, I think this is
> > probably a 'data abort'?
>
> Right, it is data abort.
>
> >
> >> Currently, kernel checks the pages against NULL. In my case, however, the
> >> address was not NULL, and was big enough so that the hardware generated
> >> Address Size Abort on arm64.
>
> Could you please provide the actual abort stack here with details.
[ 665.484101] Unhandled fault at 0xffff8000252cd000
[ 665.488807] Mem abort info:
[ 665.491617] ESR = 0x96000043
[ 665.494675] EC = 0x25: DABT (current EL), IL = 32 bits
[ 665.499985] SET = 0, FnV = 0
[ 665.503039] EA = 0, S1PTW = 0
[ 665.506167] Data abort info:
[ 665.509047] ISV = 0, ISS = 0x00000043
[ 665.512882] CM = 0, WnR = 1
[ 665.515851] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000818cb000
[ 665.522550] [ffff8000252cd000] pgd=000000affcfff003,
pud=000000affcffe003, pmd=0000008fad8c3003, pte=00688000a5217713
[ 665.533160] Internal error: level 3 address size fault: 96000043 [#1] SMP
[ 665.539936] Modules linked in: [...]
[ 665.616212] CPU: 178 PID: 13199 Comm: test Tainted: P OE
5.4.0-84-generic #94~18.04.1-Ubuntu
[ 665.626806] Hardware name: HPE Apollo 70 /C01_APACHE_MB
, BIOS L50_5.13_1.0.6 07/10/2018
[ 665.636618] pstate: 80400009 (Nzcv daif +PAN -UAO)
[ 665.641407] pc : __memset+0x38/0x188
[ 665.645146] lr : test+0xcc/0x3f8
[ 665.650184] sp : ffff8000359bb840
[ 665.653486] x29: ffff8000359bb840 x28: 0000000000000000
[ 665.658785] x27: 0000000000000000 x26: 0000000000231000
[ 665.664083] x25: ffff00ae660f6110 x24: ffff00ae668cb800
[ 665.669382] x23: 0000000000000001 x22: ffff00af533e5000
[ 665.674680] x21: 0000000000001000 x20: 0000000000000000
[ 665.679978] x19: ffff00ae66950000 x18: ffffffffffffffff
[ 665.685276] x17: 00000000588636a5 x16: 0000000000000013
[ 665.690574] x15: ffffffffffffffff x14: 000000000007ffff
[ 665.695872] x13: 0000000080000000 x12: 0140000000000000
[ 665.701170] x11: 0000000000000041 x10: ffff8000652cd000
[ 665.706468] x9 : ffff8000252cf000 x8 : ffff8000252cd000
[ 665.711767] x7 : 0303030303030303 x6 : 0000000000001000
[ 665.717065] x5 : ffff8000252cd000 x4 : 0000000000000000
[ 665.722363] x3 : ffff8000252cdfff x2 : 0000000000000001
[ 665.727661] x1 : 0000000000000003 x0 : ffff8000252cd000
[ 665.732960] Call trace:
[ 665.735395] __memset+0x38/0x188
[...]
TCR_EL1 is 34b5503510, and so TCR_EL1.IPS is 0b100. The pfn that caused
address size abort has bit#47 set, which is far above 16TB that is allowed by
ips == 100.
> >>
> >> Interestingly, this abort happens even if copy_from_kernel_nofault() is
> >> used, which is quite inconvenient for debugging purposes.
> >>
> >> This patch adds a pfn_valid() check into vmap() path, so that invalid
> >> mapping will not be created.
> >>
> >> RFC: https://lkml.org/lkml/2022/1/18/815
> >> v1: use pfn_valid() instead of adding an arch-specific
> >> arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.
>
> This should be after the '---' below.
>
> >>
> >> Signed-off-by: Yury Norov <yury.norov@xxxxxxxxx>
> >
> > Suggested-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> >
> >> ---
> >> mm/vmalloc.c | 2 ++
> >> 1 file changed, 2 insertions(+)
> >>
> >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> >> index d2a00ad4e1dd..a4134ee56b10 100644
> >> --- a/mm/vmalloc.c
> >> +++ b/mm/vmalloc.c
> >> @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
> >> return -EBUSY;
> >> if (WARN_ON(!page))
> >> return -ENOMEM;
> >> + if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> >> + return -EINVAL;
> >> set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
> >> (*nr)++;
> >> } while (pte++, addr += PAGE_SIZE, addr != end);
> >> --
> >> 2.30.2
> >>
>
> Why should not this just scan over the entire user provided struct page
> array and make sure that all pages there in are valid via above method,
> but in vmap() itself before calling vmap_pages_range(). Because seems
> like a single invalid page detected in vmap_pages_pte_range() will
> anyways abort the entire vmap(). This will also enable us to drop the
> existing NULL check above.
I can do this, but why is it any better than the current approach?
Thanks,
Yury