Re: [PATCH qemu] x86: don't let decompressed kernel image clobber setup_data

From: H. Peter Anvin
Date: Wed Dec 28 2022 - 21:14:08 EST


On 12/28/22 15:58, H. Peter Anvin wrote:
On December 28, 2022 8:57:54 AM PST, "Jason A. Donenfeld" <Jason@xxxxxxxxx> wrote:
HELLO H. PETER ANVIN,
E
L
L
O

On Wed, Dec 28, 2022 at 05:30:30PM +0100, Jason A. Donenfeld wrote:
Fix looks good, glad you figured out the problem.

I mean, kind of. The solution here sucks, especially given that in the
worst case, setup_data just gets dropped. I'm half inclined to consider
this a kernel bug instead, and add some code to relocate setup_data
prior to decompression, and then fix up all the links. It seems like
this would be a lot more robust.

I just wish the people who wrote this stuff would chime in. I've had
x86@xxxxxxxxxx CC'd but so far, no input from them.

Apparently you are the x86 boot guru. What do you want to happen here?
Your input would be very instrumental.

Jason

Hi!

Glad you asked.

So the kernel load addresses are parameterized in the kernel image
setup header. One of the things that are so parameterized are the
size and possible realignment of the kernel image in memory.

I'm very confused where you are getting the 64 MB number from. There
should not be any such limitation.

In general, setup_data should be able to go anywhere the initrd can
go, and so is subject to the same address cap (896 MB for old
kernels, 4 GB on newer ones; this address too is enumerated in the
header.)

If you want to put setup_data above 4 GB, it *should* be ok if and
only if the kernel supports loading the initrd high, too (again,
enumerated in the header.

TL;DR: put setup_data where you put the initrd (before or after
doesn't matter.)

To be maximally conservative, link the setup_data list in order from
lowest to highest address; currently there is no such item of
relevance, but in the future there may be setup_data items needed by
the BIOS part of the bootstrap in which case they would have to be <
1 MB and precede any items > 1 MB for obvious reasons. That being
said, with BIOS dying it is not all that likely that such entries
will ever be needed.


So let me try for an algorithm. Attached as a text file to avoid line break damage.

-hpaHere is an attempted description with pseudo-C code:

First of all, take a 4K page of memory and *initialize it to zero*.
{
#include <asm/bootparam.h> /* From the uapi kernel sources */

/* Allocated somewhere in your code... */
extern unsigned char *kernel_image; /* Kernel file */
extern struct boot_params *boot_params; /* 4K buffer */
extern uint32_t kernel_image_size; /* Size of kernel file */

/* Callbacks into your code */
extern bool is_bios_boot(void);
extern uint32_t end_of_low_memory(void); /* For BIOS boot */
/*
* This MUST return an alignment address between start_address
* and max_address...
*/
extern uint64_t maybe_relocate_kernel(uint64_t start_address,
uint64_t max_address, uint32_t alignment);

/*
* Convenience pointer into the kernel image; modifications
* done here should be reflected in the loaded kernel image
*/
struct setup_header * const kernel_setup_header =
(struct setup_header *)(kernel_image + 0x1f1);

/* Initialize boot_params to zero!!! */
memset(boot_params, 0, sizeof *boot_params);
}

Copy the setup header starting at file offset 0x1f1 to offset 0x1f1
into that page:
{
int setup_length =
kernel_setup_header->header == 0x53726448
? (kernel_setup_header->jump >> 8) + 17 : 15;

memcpy(&boot_params->hdr, kernel_setup_header, setup_length);
}

Now you can compute values including ones are omitted by older kernels:
{
/*
* Split between the part of the kernel to be loaded into
* low memory (for 16-bit boot, otherwise it can be safely
* omitted) and the part to be loaded into high memory.
*/
if (!boot_params->hdr.setup_sects)
boot_param->hdr.setup_sects = 4;

int high_kernel_start = (boot_param->hdr.setup_sects+1) << 9;

/*
* Highest permitted address for the high part of the kernel image,
* initrd, command line (*except for 16-bit boot*), and setup_data
*
* max_initrd_addr here is exclusive
*/
uint64_t max_initrd_addr = (uint64_t)boot_params->hdr.initrd_addr_max + 1;
if (boot_params->hdr.version < 0x0200)
max_initrd_addr = 0; /* No initrd supported */
else if (boot_params->hdr.version < 0x0203)
max_initrd_addr = 0x38000000;
else if (boot_params->hdr.version >= 0x020c &&
(boot_params->hdr.xloadflags & XLF_CAN_BE_LOADED_ABOVE_4G))
max_initrd_addr = (uint64_t)1 << 52; /* Architecture-imposed limit */

/*
* Maximum command line size *including terminating null*
*/
unsigned int cmdline_size;
if (boot_params->hdr.version < 0x0200)
cmdline_size = 0; /* No command line supported */
else if (boot_params->hdr.version < 0x0206)
boot_params->hdr.cmdline_size = 256;
else
boot_params->hdr.cmdline_size + 1;

/* Command line size including terminating null */

/*
* Load addresses for the low and high kernels, respectively
*/
uint32_t low_kernel_address;
uint64_t cmdline_addr; /* Address to load the command line */

if (is_bios_boot()) {
if (!(boot_params->hdr.loadflags & LOADED_HIGH)) {
low_kernel_address = 0x90000;
} else {
/*
* Recommended to be the lowest available address between
* 0x10000 and 0x90000
*/
low_kernel_address = preferred_low_kernel_address();
}

uint32_t lowkernel_max;

lowkernel_max = low_kernel_address + 0x10000;
if (boot_params.hdr.version >= 0x0202)
lowkernel_max += (cmdline_size + 15) & ~15;

/*
* end_of_low_memory() is usually given by *(uint8_t *)0x413 << 10
*/
if (lowkernel_max > end_of_low_memory())
lowkernel_max = end_of_low_memory();

cmdline_addr = (lowkernel_max - cmdline_size) & ~15;
if (boot_params->hdr.version >= 0x0202)
kernel_setup_header->cmd_line_ptr = cmdline_addr;
else if (boot_params->hdr.version >= 0x0200)
kernel_setup_header->setup_move_size =
lowkernel_max - low_kernel_address;

if (boot_params.hdr.version >= 0x0201) {
kernel_setup_header->heap_end_ptr
= cmdline_addr - low_kernel_address - 0x0200;
kernel_setup_header->loadflags |= CAN_USE_HEAP;
}
} else {
low_kernel_address = 0; /* Not used for non-BIOS boot */
cmdline_addr = 0; /* Not assigned yet */
}

/*
* Default load address for the high kernel, and if it can be relocated
*/
uint64_t high_kernel_address;
uint32_t high_kernel_size; /* The amount of memory the high kernel needs */
bool relocatable_kernel = false;
uint32_t high_kernel_alignment = 0x400000; /* Kernel runtime alignment */

if (!(boot_params->hdr.loadflags & LOADED_HIGH)) {
high_kernel_address = 0x10000;
} else {
if (boot_params->hdr.version >= 0x020a)
high_kernel_address = boot_params->hdr.pref_address;
else
high_kernel_address = 0x100000;

if (boot_params->hdr.version >= 0x0205 &&
boot_params->hdr.relocatable_kernel) {
relocatable_kernel = true;
high_kernel_alignment = boot_params->hdr.kernel_alignment;
}
}

/*
* Linear memory area needed by the kernel
*/
uint32_t kernel_mem_size;
if (boot_params->hdr.version >= 0x020a)
kernel_mem_size = boot_params->hdr.init_size;
else
kernel_mem_size = kernel_image_size << 2; /* Pure guesswork... */

/* Relocate the kernel load address if desired */
if (relocatable_kernel) {
high_kernel_address =
maybe_relocate_kernel(high_kernel_address,
max_initrd_addr - kernel_mem_size,
high_kernel_aligment);
}

/* Adjust for possible internal kernel realigment */
kernel_mem_size += (-high_kernel_address) & (high_kernel_alignment - 1);

/*
* Determine the minimum safe address for loading initrd, setup_data,
* and, if cmdline_addr == 0 (i.e. !is_bios_boot()), the command line.
*/
uint64_t min_initrd_addr = high_kernel_address + kernel_mem_size;
}