Re: [PATCH 3/5] arm: decompressor: define a new zImage tag

From: Russell King - ARM Linux admin
Date: Mon Jun 01 2020 - 16:42:02 EST


On Mon, Jun 01, 2020 at 10:27:45PM +0200, Lukasz Stelmach wrote:
> It was <2020-06-01 pon 19:25>, when Russell King - ARM Linux admin wrote:
> > On Mon, Jun 01, 2020 at 06:19:52PM +0200, Lukasz Stelmach wrote:
> >> It was <2020-06-01 pon 15:55>, when Russell King - ARM Linux admin wrote:
> >> > On Mon, Jun 01, 2020 at 04:27:52PM +0200, Åukasz Stelmach wrote:
> >> >> Add DCSZ tag which holds dynamic memory (stack, bss, malloc pool)
> >> >> requirements of the decompressor code.
> >> >
> >> > Why do we need to know the stack and BSS size, when the userspace
> >> > kexec tool doesn't need to know this to perform the same function.
> >>
> >>
> >> kexec-tools load zImage as low in DRAM as possible and rely on two
> >> assumptions:
> >>
> >> + the zImage will copy itself to make enough room for the kernel,
> >> + sizeof(zImage+mem) < sizeof(kernel+mem), which is true because
> >> of compression.
> >>
> >> DRAM start
> >> + 0x8000
> >>
> >> zImage |-----------|-----|-------|
> >> text+data bss stack
> >>
> >> text+data bss
> >> kernel |---------------------|-------------------|
> >>
> >>
> >> initrd |-initrd-|-dtb-|
> >
> > This is actually incorrect, because the decompressor will self-
> > relocate itself to avoid the area that it is going to decompress
> > into.
>
> I described the state right after kexec(8) invocation.

Actually, you haven't, because this is _not_ how kexec(8) lays it
out, as I attempted to detail further down in my reply.

> > So, while the decompressor runs, in the above situation it
> > ends up as:
> >
> >
> > ram |------------------------------------------------------...
> > text+data bss
> > kernel |---------------------|-------------------|
> > zImage |-----------|-----|-------|
> > text+data bss stack+malloc

Note here - if the initrd was placed as _you_ describe at the end
of where the zImage ends up including its BSS, it would be
corrupted by the stack and malloc space of the decompressor while
running. Ergo, your description of how kexec(8) lays stuff out
is incorrect.

> > Where "text+data" is actually smaller than the image size that
> > was loaded - the part of the image that performs the relocation
> > is discarded (the first chunk of code up to "restart" - 200
> > bytes.) The BSS is typically smaller than 200 bytes, so we've
> > been able to get away without knowing the actual BSS size so
> > far.
> >
> >
> > ram |--|-----------------------------------------|---------...
> > |<>| TEXT_OFFSET
> > kernel |---------------------|-------------------|
> > |<----edata_size----->|<-----bss_size---->|
> > |<---------------kernel_size------------->|
> > zImage |-----------|-----|-------|
> > |<-------len------->| (initial)
> > |<----------len------------>| (final)
> >
> > The "final" len value is what the decompressor prints as the "zImage
> > requires" debugging value.
> >
> > Hence, the size that the decompressed kernel requires is kernel_size.
> >
> > The size that the decompressor requires is edata_size + len(final).
> >
> > Now, if you intend to load the kernel to ram + TEXT_OFFSET + edata_size
> > then it isn't going to lose the first 200 bytes of code, so as you
> > correctly point out, we need to know the BSS size.
>
> Formal note: can we keep using terms zImage and kernel as separate,
> where zImage is what is loaded with kexec and kernel is the decompressed
> code loaded at TEXT_OFFSET. I believe, it will help us avoid mistakes.
>
> >> >> +struct arm_zimage_tag_dc {
> >> >> + struct tag_header hdr;
> >> >> + union {
> >> >> +#define ZIMAGE_TAG_DECOMP_SIZE ARM_ZIMAGE_MAGIC4
> >> >> + struct zimage_decomp_size {
> >> >> + __le32 bss_size;
> >> >> + __le32 stack_size;
> >> >> + __le32 malloc_size;
> >> >> + } decomp_size;
> >
> > You certainly don't need to know all this. All you need to know is
> > how much space the decompressor requires after the end of the image,
> > encompassing the BSS size, stack size and malloc size, which is one
> > value.
>
> I agree. However, since we are not fighting here for every single byte,
> I'd rather add them as separate values and make the tag, even if only
> slightly, more future-proof.

It doesn't make it more future-proof. What happens if we add something
else, do we need to list it separately, and then change the kernel to
accept the new value and maybe also kexec(8)? Or do we just say "the
decompressor needs X many bytes after the image" and be done with it?
The latter sounds way more future-proof to me.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up