Re: [Part3 PATCH v2 0/4] Support hot-remove local pagetable pages.

From: Yasuaki Ishimatsu
Date: Tue Jun 18 2013 - 22:59:44 EST

2013/06/19 8:59, Toshi Kani wrote:
On Tue, 2013-06-18 at 19:05 +0200, Vasilis Liaskovitis wrote:

On Thu, Jun 13, 2013 at 09:03:52PM +0800, Tang Chen wrote:
The following patch-set from Yinghai allocates pagetables to local nodes.

Since pagetable pages are used by the kernel, they cannot be offlined.
As a result, they cannot be hot-remove.

This patch fix this problem with the following solution:

1. Introduce a new bootmem type LOCAL_NODE_DATAL, and register local
pagetable pages as LOCAL_NODE_DATAL by setting page-> to
LOCAL_NODE_DATAL, just like we register SECTION_INFO pages.

2. Skip LOCAL_NODE_DATAL pages in offline/online procedures. When the
whole memory block they reside in is offlined, the kernel can
still access the pagetables.
(This changes the semantics of offline/online a little bit.)

This could be a design problem of part3: if we allow local pagetable memory
to not be offlined but allow the offlining to return successfully, then
hot-remove is going to succeed. But the direct mapped pagetable pages are still
mapped in the kernel. The hot-removed memblocks will suddenly disappear (think
physical DIMMs getting disabled in real hardware, or in a VM case the
corresponding guest memory getting freed from the emulator e.g. qemu/kvm). The
system can crash as a result.

I think these local pagetables do need to be unmapped from kernel, offlined and
removed somehow - otherwise hot-remove should fail. Could they be migrated
alternatively e.g. to node 0 memory? But Iiuc direct mapped pages cannot be
migrated, correct?

What is the original reason for local node pagetable allocation with regards
to memory hotplug? I assume we want to have hotplugged nodes use only their local
memory, so that there are no inter-node memory dependencies for hot-add/remove.
Are there other reasons that I am missing?

I second Vasilis. The part1/2/3 series could be much simpler & less
riskier if we focus on the SRAT changes first, and make the local node
pagetable changes as a separate item. Is there particular reason why
they have to be done at a same time?

If my understanding is correct:
Main purpose of Yinghai's work is to put pagetable on local node ram.
For this, he needs to know SRAT information before setting pagetable.
So part1 does them same time.

Yasuaki Ishimatsu


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at