Re: [tip:core/memblock] x86, memblock: Fix crashkernel allocation

From: Vivek Goyal
Date: Wed Oct 06 2010 - 18:47:19 EST


On Wed, Oct 06, 2010 at 03:16:17PM -0700, H. Peter Anvin wrote:
> On 10/06/2010 08:14 AM, Vivek Goyal wrote:
> >
> > I really don't understand why to put a upper limit of DEFAULT_BZIMAGE_ADDR_MAX.
> > It does not make much sense to internally impose an upper limit on
> > reserved memory area if reserver has not specified one.
> >
> > Why can't we provide a function which does a search from bottom up for
> > the required size of memory. If the memory finally reserved does not meet
> > the constraints needed by kexec, then kexec load will fail. Kernel does
> > not have to try to figure out the upper limit in this case.
> >
> > Current state of affairs are not perfect, but coming up with artificial
> > upper limit here is further deterioriating the situation, IMHO.
> >
> > Regarding the question of specifying the upper limit by kexec on command
> > line, I think it is hard. Kexec needs to load multiple segments and some
> > needs to go in really low memory area and some can be in higher memory
> > area. What is the upper limit in this case. If we take the upper limit
> > of lowest memory segment, then we will just not have sufficient memory
> > to load all segments.
> >
> > That would mean split the reserved region into multiple parts and one
> > should specifiy separate upper limit for each region. That would make
> > the whole thing complex.
> >
> > So can we atleast maintain the status quo where we search for crashkernel
> > memory bottom up without any upper limits instread of top down.
> >
>
> The reason the "whole thing is complex" is because your constraints are
> complex, and you're still trying to hide them from the kernel. And what
> is absolutely incomprehensible to me is that you seem to think this is okay.
>
> I really, REALLY, ***REALLY*** don't want to burden the kernel with a
> bunch of constraints which are invisible to it, where things will
> randomly fail because the implementation changed. We have too much of
> that already, and it causes an enormous amount of problems all over the
> kernel.
>
> Of course, we're already painted into a corner with a bad design that
> isn't going to change overnight, and of course, this is hardly the first
> time this has happened -- we do find our way out of tight spots on a
> regular basis. Perhaps you're right and the best thing is to add an
> explicit bottoms-up allocator function for now, *BUT* I would also like
> to see a firm commitment to fix the underlying architectural problem for
> real, and not just "maintain the status quo" indefinitely, which is what
> your emails make me think you're expecting.

I really don't mind fixing the things properly in long term, just that I am
running out of ideas regarding how to fix it in proper way.

To me the best thing would be that this whole allocation thing be dyanmic
from user space where kexec will run, determine what it is loading,
determine what are the memory contstraints on these segments (min, upper
limit, alignment etc), and then ask kernel for reserving contiguous
memory. This kind of dynamic reservation will remove lot of problems
associated with crashkernel= reservations.

But I am not aware of anyway of doing dynamic allocation and it certainly
does not seem to be easy to be able to allocated 128M of memory contiguously.

Because we don't have a way to reserve memory dynamically later, we end up
doing a big chunk of reservation using kernel command line and later
figure out what to load where. Now with this approach kexec has not even run
so how it can tell you what are the memory constraints.

So to me one of the ways of properly fixing is adding some kind of
capability to reserve the memory dynamically (may be using sys_kexec())
and get rid of this notion of reserving memory at boot time.

The other concern you raised is hiding constraints from kernel. At this
point of time the only problem with crashkernel=X@0 syntax is that it
does not tell you whether to look for memory bottom up or top down. How
about if we specify it explicitly in the syntax so that kernel does not
have to assume things?

In fact the initial crashkernel syntax was. crashkernel=X@Y. This meant
allocated X amount of memory at location Y. This left no ambiguity and
kernel did not have to assume things. It had the problem though that
we might not have physical RAM at location Y. So I think that's when
somebody came up with the idea of crashkernel=X@0 so that we ideally
want memory at location 0, but if you can't provide that, then provide
anything available next scanning bottom up.

So the only part missing from syntax is explicitly speicifying "next
available location scanning bottom up". If we add that to syntax then
kernel does not have to make assumptions. (except the alignment part).

So how about modifying syntax to crashkernel=X@Y#BU.

The "#BU" part can be optional and in that case kernel is free to allocate
memory either top down or bottom up.

Or any other string which can communicate the bottom up part in a more
intutive manner.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/