On 2015/10/9 17:24, Kamezawa Hiroyuki wrote:
On 2015/10/09 15:46, Xishi Qiu wrote:
On 2015/10/9 22:56, Taku Izumi wrote:
Xeon E7 v3 based systems supports Address Range Mirroring
and UEFI BIOS complied with UEFI spec 2.5 can notify which
ranges are reliable (mirrored) via EFI memory map.
Now Linux kernel utilize its information and allocates
boot time memory from reliable region.
My requirement is:
- allocate kernel memory from reliable region
- allocate user memory from non-reliable region
In order to meet my requirement, ZONE_MOVABLE is useful.
By arranging non-reliable range into ZONE_MOVABLE,
reliable memory is only used for kernel allocations.
Hi Taku,
You mean set non-mirrored memory to movable zone, and set
mirrored memory to normal zone, right? So kernel allocations
will use mirrored memory in normal zone, and user allocations
will use non-mirrored memory in movable zone.
My question is:
1) do we need to change the fallback function?
For *our* requirement, it's not required. But if someone want to prevent
user's memory allocation from NORMAL_ZONE, we need some change in zonelist
walking.
Hi Kame,
So we assume kernel will only use normal zone(mirrored), and users use movable
zone(non-mirrored) first if the memory is not enough, then use normal zone too.
2) the mirrored region should locate at the start of normal
zone, right?
Precisely, "not-reliable" range of memory are handled by ZONE_MOVABLE.
This patch does only that.
I mean the mirrored region can not at the middle or end of the zone,
BIOS should report the memory like this,
e.g.
BIOS
node0: 0-4G mirrored, 4-8G mirrored, 8-16G non-mirrored
node1: 16-24G mirrored, 24-32G non-mirrored
OS
node0: DMA DMA32 are both mirrored, NORMAL(4-8G), MOVABLE(8-16G)
node1: NORMAL(16-24G), MOVABLE(24-32G)
I remember Kame has already suggested this idea. In my opinion,
I still think it's better to add a new migratetype or a new zone,
so both user and kernel could use mirrored memory.
Hi, Xishi.
I and Izumi-san discussed the implementation much and found using "zone"
is better approach.
The biggest reason is that zone is a unit of vmscan and all statistics and
handling the range of memory for a purpose. We can reuse all vmscan and
information codes by making use of zones. Introdcing other structure will be messy.
Yes, add a new zone is better, but it will change much code, so reuse ZONE_MOVABLE
is simpler and easier, right?
His patch is very simple.
The following plan sounds good to me. Shall we rename the zone name when it is
used for mirrored memory, "movable" is a little confusion.
yes.For your requirements. I and Izumi-san are discussing following plan.
- Add a flag to show the zone is reliable or not, then, mark ZONE_MOVABLE as not-reliable.
- Add __GFP_RELIABLE. This will allow alloc_pages() to skip not-reliable zone.
- Add madivse() MADV_RELIABLE and modify page fault code's gfp flag with that flag.
like this?
user: madvise()/mmap()/or others -> add vma_reliable flag -> add gfp_reliable flag -> alloc_pages
kernel: use __GFP_RELIABLE flag in buddy allocation/slab/vmalloc...
Also we can introduce some interfaces in procfs or sysfs, right?