Re: [init/initramfs.c] e7cb072eb9: invoked_oom-killer:gfp_mask=0x

From: Oliver Sang
Date: Fri Jun 11 2021 - 04:32:01 EST


hi Rasmus,

On Tue, Jun 08, 2021 at 09:42:58AM +0200, Rasmus Villemoes wrote:
> On 07/06/2021 16.44, kernel test robot wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed the following commit (built with gcc-9):
> >
> > commit: e7cb072eb988e46295512617c39d004f9e1c26f8 ("init/initramfs.c: do unpacking asynchronously")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> >
> > in testcase: locktorture
> > version:
> > with following parameters:
> >
> > runtime: 300s
> > test: cpuhotplug
> >
> > test-description: This torture test consists of creating a number of kernel threads which acquire the lock and hold it for specific amount of time, thus simulating different critical region behaviors.
> > test-url: https://www.kernel.org/doc/Documentation/locking/locktorture.txt
> >
> >
> > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> > please be noted that we use 'vmalloc=512M' for both parent and this commit.
> > since it's ok on parent but oom on this commit, we want to send this report
> > to show the potential problem of the commit on some cases.
> >
> > we also tested by changing to use 'vmalloc=128M', it will succeed.
> >
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> >
> >
> > [ 4.443950] e1000: Copyright (c) 1999-2006 Intel Corporation.
> > [ 4.716374] ACPI: _SB_.LNKC: Enabled at IRQ 11
> > [ 5.081518] e1000 0000:00:03.0 eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56
> > [ 5.082999] e1000 0000:00:03.0 eth0: Intel(R) PRO/1000 Network Connection
> > [ 5.085275] VFIO - User Level meta-driver version: 0.3
> > [ 8.029204] kworker/u4:0 invoked oom-killer: gfp_mask=0x100cc0(GFP_USER), order=0, oom_score_adj=0
> > [ 8.031021] CPU: 1 PID: 7 Comm: kworker/u4:0 Not tainted 5.12.0-11533-ge7cb072eb988 #1
> > [ 8.031988] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > [ 8.031988] Workqueue: events_unbound async_run_entry_fn
> > [ 8.031988] Call Trace:
> > [ 8.031988] dump_stack (kbuild/src/consumer/lib/dump_stack.c:122)
> > [ 8.031988] dump_header (kbuild/src/consumer/mm/oom_kill.c:463)
> > [ 8.031988] ? lock_release (kbuild/src/consumer/kernel/locking/lockdep.c:5190 kbuild/src/consumer/kernel/locking/lockdep.c:5532)
> > [ 8.031988] ? out_of_memory (kbuild/src/consumer/include/linux/rcupdate.h:710 kbuild/src/consumer/mm/oom_kill.c:379 kbuild/src/consumer/mm/oom_kill.c:1102 kbuild/src/consumer/mm/oom_kill.c:1048)
> > [ 8.031988] out_of_memory.cold (kbuild/src/consumer/mm/oom_kill.c:1106 kbuild/src/consumer/mm/oom_kill.c:1048)
> >
> >
> > To reproduce:
> >
> > # build kernel
> > cd linux
> > cp config-5.12.0-11533-ge7cb072eb988 .config
> > make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules
> > make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 INSTALL_MOD_PATH=<mod-install-dir> modules_install
> > cd <mod-install-dir>
> > find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
>
> So I got this far...
>
> > git clone https://github.com/intel/lkp-tests.git
> > cd lkp-tests
> > bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
>
> Is there some way to reproduce which doesn't require adding an lkp user?

no need to run the test by 'lkp' account. lkp-tests will create a .lkp folder
under home path. do you mean this? normally we run 'qemu' by 'root'.

> Also, I don't have 16G to give to a virtual machine. I tried running the
> bzImage with that modules.cgz under qemu with some naive parameters just
> to get some output [1], but other than failing because there's no rootfs
> to mount (as expected), I only managed to make it fail when providing
> too little memory (the .cgz is around 70M, decompressed about 200M -
> giving '-m 1G' to qemu works fine). You mention the vmalloc= argument,
> but I can't make the decompression fail when passing either vmalloc=128M
> or vmalloc=512M or no vmalloc= at all.

sorry about this. we also tried to follow exactly above steps to test on
some local machine (8G memory), but cannot reproduce. we are analyzing
what's the diference in our automaion run in test cluster, which reproduced
the issue consistently. will update you when we have findings.

>
> As an extra data point, what happens if you add initramfs_async=0 to the
> command line?

yes, we tested this before sending out the report. the issue gone
if initramfs_async=0 is added.

>
> [1] qemu-system-x86_64 -kernel arch/i386/boot/bzImage -initrd
> ../../tmp/header-install/modules.cgz -append "console=ttyAMA0
> console=ttyS0 vmalloc=512M" -serial stdio -smp 2 -m 1G
>
> Rasmus