Re: [PATCH] mm: hugetlb: Only prep and add allocated folios for non-gigantic pages

From: Mike Kravetz
Date: Thu Oct 12 2023 - 10:54:12 EST


On 10/11/23 17:03, Nathan Chancellor wrote:
> On Mon, Oct 09, 2023 at 06:23:45PM -0700, Mike Kravetz wrote:
> > On 10/09/23 15:56, Usama Arif wrote:
>
> I suspect the crash that our continuous integration spotted [1] is the
> same issue that Konrad is seeing, as I have bisected that failure to
> bfb41d6b2fe1 in next-20231009. However, neither the first half of your
> diff (since the second half does not apply at bfb41d6b2fe1) nor the
> original patch in this thread resolves the issue though, so maybe it is
> entirely different from Konrad's?
>
> For what it's worth, this issue is only visible for me when building for
> arm64 using LLVM with CONFIG_INIT_STACK_NONE=y, instead of the default
> CONFIG_INIT_STACK_ALL_ZERO=y (which appears to hide the problem?),
> making it seem like it could be something with uninitialized memory... I
> have not been able to reproduce it with GCC, which could also mean
> something.

Thank you Nathan! That is very helpful.

I will use this information to try and recreate. If I can recreate, I
should be able to get to root cause.
--
Mike Kravetz

> Using LLVM 17.0.2 from kernel.org [2]:
>
> $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 mrproper defconfig
>
> $ scripts/config -d INIT_STACK_ALL_ZERO -e INIT_STACK_NONE
>
> $ make -skj"$(nproc)" ARCH=arm64 LLVM=1 Image.gz
>
> $ qemu-system-aarch64 \
> -display none \
> -nodefaults \
> -cpu max,pauth-impdef=true \
> -machine virt,gic-version=max,virtualization=true \
> -append 'console=ttyAMA0 earlycon' \
> -kernel arch/arm64/boot/Image.gz \
> -initrd arm64-rootfs.cpio \
> -m 512m \
> -serial mon:stdio
> ...
> [ 0.000000] Linux version 6.6.0-rc4-00317-gbfb41d6b2fe1 (nathan@dev-arch.thelio-3990X) (ClangBuiltLinux clang version 17.0.2 (https://github.com/llvm/llvm-project b2417f51dbbd7435eb3aaf203de24de6754da50e), ClangBuiltLinux LLD 17.0.2) #1 SMP PREEMPT Wed Oct 11 16:44:41 MST 2023
> ...
> [ 0.304543] Unable to handle kernel paging request at virtual address ffffff602827f9f4
> [ 0.304899] Mem abort info:
> [ 0.305022] ESR = 0x0000000096000004
> [ 0.305438] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 0.305668] SET = 0, FnV = 0
> [ 0.305804] EA = 0, S1PTW = 0
> [ 0.305949] FSC = 0x04: level 0 translation fault
> [ 0.306156] Data abort info:
> [ 0.306287] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [ 0.306500] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [ 0.306711] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [ 0.306976] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041cc3000
> [ 0.307251] [ffffff602827f9f4] pgd=0000000000000000, p4d=0000000000000000
> [ 0.308086] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [ 0.308428] Modules linked in:
> [ 0.308722] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.0-rc4-00317-gbfb41d6b2fe1 #1
> [ 0.309159] Hardware name: linux,dummy-virt (DT)
> [ 0.309496] pstate: 61400009 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
> [ 0.309987] pc : gather_bootmem_prealloc+0x80/0x1a8
> [ 0.310673] lr : hugetlb_init+0x1c8/0x2ec
> [ 0.310871] sp : ffff80008000ba10
> [ 0.311038] x29: ffff80008000ba30 x28: 0000000000000000 x27: ffffd80a09fe7db8
> [ 0.311417] x26: 0000000000000001 x25: ffffd80a09fe7db8 x24: 0000000000000100
> [ 0.311702] x23: fffffc0000000000 x22: 0001000000000000 x21: ffff80008000ba18
> [ 0.311987] x20: ffffff602827f9c0 x19: ffffd80a0a555b60 x18: 00000000fbf7386f
> [ 0.312272] x17: 00000000bee83943 x16: 000000002ae32058 x15: 0000000000000000
> [ 0.312557] x14: 0000000000000009 x13: ffffd80a0a556d28 x12: ffffffffffffee38
> [ 0.312831] x11: ffffd80a0a556d28 x10: 0000000000000004 x9 : ffffd80a09fe7000
> [ 0.313141] x8 : 0000000d80a09fe7 x7 : 0000000001e1f7fb x6 : 0000000000000008
> [ 0.313425] x5 : ffffd80a09ef1454 x4 : ffff00001fed5630 x3 : 0000000000019e00
> [ 0.313703] x2 : ffff000002407b80 x1 : 0000000000019d00 x0 : 0000000000000000
> [ 0.314054] Call trace:
> [ 0.314259] gather_bootmem_prealloc+0x80/0x1a8
> [ 0.314536] hugetlb_init+0x1c8/0x2ec
> [ 0.314743] do_one_initcall+0xac/0x220
> [ 0.314928] do_initcall_level+0x8c/0xac
> [ 0.315114] do_initcalls+0x54/0x94
> [ 0.315276] do_basic_setup+0x1c/0x28
> [ 0.315450] kernel_init_freeable+0x104/0x170
> [ 0.315648] kernel_init+0x20/0x1a0
> [ 0.315822] ret_from_fork+0x10/0x20
> [ 0.316235] Code: 979e8c0d 8b160328 d34cfd08 8b081af4 (b9403688)
> [ 0.316745] ---[ end trace 0000000000000000 ]---
> [ 0.317463] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [ 0.318093] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
>
> The rootfs is available at [3] in case it is relevant. I am more than
> happy to provide any additional information or test any patches as
> necessary.
>
> [1]: https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/6469151768/job/17570882198
> [2]: https://mirrors.edge.kernel.org/pub/tools/llvm/
> [3]: https://github.com/ClangBuiltLinux/boot-utils/releases
>
> Cheers,
> Nathan