Re: [PATCH v2] initramfs: Expose retained initrd as sysfs file

From: Gowans, James
Date: Thu Dec 07 2023 - 04:53:41 EST


On Wed, 2023-12-06 at 21:33 +0000, Alexander Graf wrote:
> --- a/init/initramfs.c
> +++ b/init/initramfs.c
> @@ -574,6 +574,16 @@ extern unsigned long __initramfs_size;
>  #include <linux/initrd.h>
>  #include <linux/kexec.h>
>  
> +static ssize_t raw_read(struct file *file, struct kobject *kobj,
> + struct bin_attribute *attr, char *buf,
> + loff_t pos, size_t count)
> +{
> + memcpy(buf, attr->private + pos, count);
> + return count;
> +}
> +
> +static BIN_ATTR(initrd, 0440, raw_read, NULL, 0);
> +
>  void __init reserve_initrd_mem(void)
>  {
>   phys_addr_t start;
> @@ -715,8 +725,14 @@ static void __init do_populate_rootfs(void *unused, async_cookie_t cookie)
>   * If the initrd region is overlapped with crashkernel reserved region,
>   * free only memory that is not part of crashkernel region.
>   */
> - if (!do_retain_initrd && initrd_start && !kexec_free_initrd())
> + if (!do_retain_initrd && initrd_start && !kexec_free_initrd()) {
>   free_initrd_mem(initrd_start, initrd_end);
> + } else if (do_retain_initrd) {
> + bin_attr_initrd.size = initrd_end - initrd_start;
> + bin_attr_initrd.private = (void *)initrd_start;
> + if (sysfs_create_bin_file(firmware_kobj, &bin_attr_initrd))
> + pr_err("Failed to create initrd sysfs file");
> + }
>   initrd_start = 0;
>   initrd_end = 0;

When adding this to my dev environment and forgot to actually give QEMU
an initramfs file, but did add the retain_initrd cmdline param. This
caused a zero-sized /sys/firmware/initrd.
When trying to read that zero sized file it generates a NPE because
attr->private is NULL.

Do you want to do some bounds checking or perhaps not expose the file if
there's not actually an initramfs?

I was also wondering if we need to do bounds checking on pos + count to
prevent reading outside the initrd data in general, but it seems like
the generic code does that.

JG

[ 17.942640] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 17.944465] #PF: supervisor read access in kernel mode
[ 17.945753] #PF: error_code(0x0000) - not-present page
[ 17.946901] PGD 0 P4D 0
[ 17.947397] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 17.948384] CPU: 0 PID: 325 Comm: cat Not tainted 6.4.0-rc7-00232-g6290264ae247-dirty #415
[ 17.948676] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
[ 17.948988] RIP: 0010:memcpy_orig+0x1e/0x140
[ 17.949142] Code: 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 48 89 f8 48 83 fa 20 0f 82 86 00 00 00 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 <4c> 8b 06 4c 8b 4e 08 4c 8b 567
[ 17.949914] RSP: 0018:ffffc90000347e18 EFLAGS: 00010206
[ 17.950103] RAX: ffff888104fc0000 RBX: ffff888101991f00 RCX: ffff888104fc0000
[ 17.950381] RDX: 0000000000000fc0 RSI: 0000000000000000 RDI: ffff888104fc0000
[ 17.950680] RBP: ffffc90000347e98 R08: 0000000000000000 R09: 0000000000001000
[ 17.950963] R10: ffff888103448900 R11: ffff888100140040 R12: 0000000000001000
[ 17.951223] R13: ffffc90000347e70 R14: 0000000000001000 R15: ffff888101991f20
[ 17.951552] FS: 00007f4ce18d7580(0000) GS:ffff88813dc00000(0000) knlGS:0000000000000000
[ 17.952021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 17.952345] CR2: 0000000000000000 CR3: 000000010368c001 CR4: 0000000000770ef0
[ 17.952833] PKRU: 55555554
[ 17.953086] Call Trace:
[ 17.953234] <TASK>
[ 17.953345] ? __die+0x1f/0x70
[ 17.953518] ? page_fault_oops+0x156/0x420
[ 17.953693] ? exc_page_fault+0x69/0x150
[ 17.953876] ? asm_exc_page_fault+0x26/0x30
[ 17.954059] ? memcpy_orig+0x1e/0x140
[ 17.954220] raw_read+0x1b/0x30
[ 17.954438] kernfs_fop_read_iter+0xa2/0x1a0
[ 17.954696] vfs_read+0x1b4/0x2d0
[ 17.954844] ksys_read+0x5e/0xe0
[ 17.954985] do_syscall_64+0x3c/0x90
[ 17.955158] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 17.955380] RIP: 0033:0x7f4ce17f1fd2