Re: KASLR may break some kernel features (was Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory)
From: Baoquan He
Date: Tue Jan 30 2018 - 21:18:47 EST
Hi Kees,
On 01/11/18 at 10:04am, Kees Cook wrote:
> On Thu, Jan 11, 2018 at 1:00 AM, Baoquan He <bhe@xxxxxxxxxx> wrote:
> > Hi Luiz,
> >
> > On 01/04/18 at 11:21am, Luiz Capitulino wrote:
> >> Having a generic kaslr parameter to control where the kernel is extracted
> >> is one solution for this problem.
> >>
> >> The general problem statement is that KASLR may break some kernel features
> >> depending on where the kernel is extracted. Two examples are hot-plugged
> >> memory (this series) and 1GB HugeTLB pages.
> >>
> >> The 1GB HugeTLB page issue is not specific to KVM guests. It just happens
> >> that there's a bunch of people running guests with up to 5GB of memory and
> >> with that amount of memory you have one or two 1GB pages and is easier for
> >> KASLR to extract the kernel into a 1GB region and split a 1GB page. So,
> >> you may not get any 1GB pages at all when this happens. However, I can also
> >> reproduce this on bare-metal with lots of memory where I can loose a 1GB
> >> page from time to time.
> >>
> >> Having a kaslr_range= parameter solves both issues, but two major drawbacks
> >> is that it breaks existing setups and I guess users will have a very hard
> >> time choosing good ranges.
> >>
> >> Another idea would be to have a CONFIG_KASLR_RANGES, where each arch
> >> could have a list of ranges known to contain holes and/or immovable
> >> memory and only extract the kernel into those ranges.
> >
> > If add CONFIG_KASLR_RANGES, then a distro like RHEL will have this range
> > always, whether people need hugetlb or not.
> >
> > So in this case, what range do we need to avoid? Only [1G, 2G]?
>
> Any ranges like that that need to be avoided should be known at build
> time, so they should simply be added to the mem_avoid list that is
> already present in the KASLR code...
Sorry, I might misunderstand your suggestion before. Are you suggesting to
add a specific range to mem_avoid[] by hardcoding?
I may not make the situation stated clearly, sorry for that. For this
hugepage issue, Luiz tested in a kvm guest with 4G memory. And the
hugetlb need allocate 1G with 1G aligned, so only [1G, 2G] area is good
1G huge page for allocation. The other area has no good 1G page for
usage:
[0, 1G]: BIOS reserved several pages;
[2G, 3G]: the top is reserved by system, 0x00000000bffe0000-0x00000000bfffffff
[3G, 4G]: no ram deployed by firmware
[4G, 5G]: system allocate from top to bottom
dmesg output snippet of kvm guest:
[ +0.000000] e820: BIOS-provided physical RAM map:
[ +0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ +0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
[ +0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[ +0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable
However, this only failed in this system. If Luiz setup kvm with 5G or
larger memory, you can see, there will be more than one good 1G page.
While kernel randomization can only occupy one. So if more than one good
1G page, the 1G huge page allocation failure won't occur. So it's a very
corner case, that's why I don't want to hardcode it into mem_avoid[].
Code sounds not reasonable with the change which we need avoid [1G, 2G]
area, and the code comments have to tell that we do this because system
with 4G memory can't allocate 1G huge page successfully. Other than that,
those system which don't need hugetlb feature, or have more memory, don't
have this issue at all.
These are my thinking about the current fixing way, not sure if it's
peruasive or make sense. Would like to hear any suggestion or different
idea to solve the encountered problems.
Thanks
Baoquan