Re: [PATCH part5 0/7] Arrange hotpluggable memory as ZONE_MOVABLE.

From: Tejun Heo
Date: Mon Aug 12 2013 - 16:20:41 EST


Hello,

On Tue, Aug 13, 2013 at 02:23:13AM +0800, Tang Chen wrote:
> >* However, we already *know* that the memory the kernel image is
> > occupying won't be removeable. It's highly likely that the amount
> > of memory allocation before NUMA / hotplug information is fully
> > populated is pretty small. Also, it's highly likely that small
> > amount of memory right after the kernel image is contained in the
> > same NUMA node, so if we allocate memory close to the kernel image,
> > it's likely that we don't contaminate hotpluggable node. We're
> > talking about few megs at most right after the kernel image. I
> > can't see how that would make any noticeable difference.
>
> This point, I don't quite agree. What you said is highly likely, but
> not definitely. Users may find they lost hotpluggable memory.

I'm having difficult time buying that. NUMA node granularity is
usually pretty large - it's in the range of gigabytes. By comparison,
the area occupied by the kernel image is *tiny* and it's just highly
unlikely that allocating a bit more memory afterwards would lead to
any meaningful difference in hotunplug support. The amount of memory
we're talking about is likely to be less than a meg, right?

> The node the kernel resides in won't be removable. This is agreed.
> But I still want SRAT earlier for the following reasons:
>
> 1. For a production provided to users, the firmware specified how
> many nodes are hotpluggable. When the system is up, if users
> found they lost movable nodes, I think it could be messy.

How is that different from the memory occupied by kernel image?
Simply allocating early memory near kernel image is extremely unlikely
to change the situation. Again, we're talking about tiny allocation
here. It should be no different from having *slightly* larger kernel
image. How is that material in any way?

> 2. Reorder SRAT parsing earlier is not that difficult to do. The
> only procedures reordered are acpi tables initialization and
> acpi_initrd_override. The acpi part patches are being reviewed.
> And it is better solution. If possible, I think we should do it.

I don't think it's a better solution. It's fragile and fiddly and
without much, if any, additional benefit. Why should we do that when
we can almost trivially solve the problem almost in memblock proper in
a way which is completely firmware-agnostic?

But, what's the extra benefit of doing that? Why would reserving less
than a megabyte after the kernel be so problematic to require this
invasive solution?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/