Re: KASLR may break some kernel features (was Re: [PATCH v5 1/4] kaslr: add immovable_mem=nn[KMG]@ss[KMG] to specify extracting memory)
From: Baoquan He
Date: Fri Jan 12 2018 - 23:03:11 EST
On 01/12/18 at 01:52pm, Luiz Capitulino wrote:
> On Fri, 12 Jan 2018 10:47:53 +0800
> Chao Fan <fanc.fnst@xxxxxxxxxxxxxx> wrote:
>
> > On Fri, Jan 12, 2018 at 10:31:52AM +0800, Baoquan He wrote:
> > >On 01/11/18 at 10:04am, Kees Cook wrote:
> > >> On Thu, Jan 11, 2018 at 1:00 AM, Baoquan He <bhe@xxxxxxxxxx> wrote:
> > >> > Hi Luiz,
> > >> >
> > >> > On 01/04/18 at 11:21am, Luiz Capitulino wrote:
> > >> >> Having a generic kaslr parameter to control where the kernel is extracted
> > >> >> is one solution for this problem.
> > >> >>
> > >> >> The general problem statement is that KASLR may break some kernel features
> > >> >> depending on where the kernel is extracted. Two examples are hot-plugged
> > >> >> memory (this series) and 1GB HugeTLB pages.
> > >> >>
> > >> >> The 1GB HugeTLB page issue is not specific to KVM guests. It just happens
> > >> >> that there's a bunch of people running guests with up to 5GB of memory and
> > >> >> with that amount of memory you have one or two 1GB pages and is easier for
> > >> >> KASLR to extract the kernel into a 1GB region and split a 1GB page. So,
> > >> >> you may not get any 1GB pages at all when this happens. However, I can also
> > >> >> reproduce this on bare-metal with lots of memory where I can loose a 1GB
> > >> >> page from time to time.
> > >> >>
> > >> >> Having a kaslr_range= parameter solves both issues, but two major drawbacks
> > >> >> is that it breaks existing setups and I guess users will have a very hard
> > >> >> time choosing good ranges.
> > >> >>
> > >> >> Another idea would be to have a CONFIG_KASLR_RANGES, where each arch
> > >> >> could have a list of ranges known to contain holes and/or immovable
> > >> >> memory and only extract the kernel into those ranges.
> > >> >
> > >> > If add CONFIG_KASLR_RANGES, then a distro like RHEL will have this range
> > >> > always, whether people need hugetlb or not.
> > >> >
> > >> > So in this case, what range do we need to avoid? Only [1G, 2G]?
> > >>
> > >> Any ranges like that that need to be avoided should be known at build
> > >> time, so they should simply be added to the mem_avoid list that is
> > >> already present in the KASLR code...
> > >
> > >Seems KASLR doesn't have an solution which allow user to specify avoided
> > >range for kernel text KASLR stage only. The memmap="!#$" can add range to
> > >mem_avoid, while it will make them not added to e820.
> > >
> >
> > How about adding a new option, like "huge_page=nn@ss". Fill the regions
> > to mem_avoid. But this parameter will only be parsed in kaslr period.
> > The followed handlling of memmap will not be excuted.
>
> If we add a new option, I think we should try to make general enough
> to satisfy both hugepages and the memory hotplug problem. Otherwise
> we'll end up adding a new option for each feature KASLR breaks...
Yes, this is my concern. We can take advantage of this opportunity to
make it.
>
> However, in the case of the 1GB page problem, I'm starting to think
> that it may be possible to know which 1GB areas are already fragmented
> and extract the kernel to one of those areas. I don't know if this would
> help the memory hotplug issue though.
This is also the thing Chao is trying to solve. Since user may not
know how to get those hotplugable memory region, Chao is trying to add a
sysfs interface to export them which are extracted from ACPI SRAT.
Wonder if hugetlb can do the similar.
And the hugetlb issue only exists in 4G memory size of system, right?
For large memory system, no such problem.
Thanks
Baoquan