Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality
From: Mike Kravetz
Date: Tue Jul 11 2017 - 14:23:49 EST
On 07/11/2017 05:36 AM, Michal Hocko wrote:
> On Thu 06-07-17 09:17:26, Mike Kravetz wrote:
>> The mremap system call has the ability to 'mirror' parts of an existing
>> mapping. To do so, it creates a new mapping that maps the same pages as
>> the original mapping, just at a different virtual address. This
>> functionality has existed since at least the 2.6 kernel.
>>
>> This patch simply adds a new flag to mremap which will make this
>> functionality part of the API. It maintains backward compatibility with
>> the existing way of requesting mirroring (old_size == 0).
>>
>> If this new MREMAP_MIRROR flag is specified, then new_size must equal
>> old_size. In addition, the MREMAP_MAYMOVE flag must be specified.
>
> I have to admit that this came as a suprise to me. There is no mention
> about this special case in the man page and the mremap code is so
> convoluted that I simply didn't see it there. I guess the only
> reasonable usecase is when you do not have a fd for the shared memory.
I was surprised as well when a JVM developer pointed this out.
>From the old e-mail thread, here is original use case:
shmget(IPC_PRIVATE, 31498240, 0x1c0|0600) = 11337732
shmat(11337732, 0, 0) = 0x40299000
shmctl(11337732, IPC_RMID, 0) = 0
mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0) = 0
mremap(0x402a9000, 0, 65536, MREMAP_MAYMOVE|MREMAP_FIXED, 0x100000) = 0x100000
The JVM team wants to do something similar. They are using
mmap(MAP_ANONYMOUS|MAP_SHARED) to create the initial mapping instead
of shmget/shmat. As Vlastimil mentioned previously, one would not
expect a shared mapping for parts of the JVM heap. I am working
to get clarification from the JVM team.
> Anyway the patch should fail with -EINVAL on private mappings as Kirill
> already pointed out
Yes. I think this should be a separate patch. As mentioned earlier,
mremap today creates a new/additional private mapping if called in this
way with old_size == 0. To me, this is a bug.
> and this should go along with an update to the
> man page which describes also the historical behavior.
Yes, man page updates are a must.
One reason for the RFC was to determine if people thought we should:
1) Just document the existing old_size == 0 functionality
2) Create a more explicit interface such as a new mremap flag for this
functionality
I am waiting to see what direction people prefer before making any
man page updates.
> Make sure you
> document that this is not really a mirroring (e.g. faulting page in one
> address will automatically map it to the other mapping(s)) but merely a
> copy of the range. Maybe MREMAP_COPY would be more appropriate name.
Good point. mirror is the first word that came to mind, but it does
not exactly apply.
--
Mike Kravetz
>
>> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
>> ---
>> include/uapi/linux/mman.h | 5 +++--
>> mm/mremap.c | 23 ++++++++++++++++-------
>> tools/include/uapi/linux/mman.h | 5 +++--
>> 3 files changed, 22 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
>> index ade4acd..6b3e0df 100644
>> --- a/include/uapi/linux/mman.h
>> +++ b/include/uapi/linux/mman.h
>> @@ -3,8 +3,9 @@
>>
>> #include <asm/mman.h>
>>
>> -#define MREMAP_MAYMOVE 1
>> -#define MREMAP_FIXED 2
>> +#define MREMAP_MAYMOVE 0x01
>> +#define MREMAP_FIXED 0x02
>> +#define MREMAP_MIRROR 0x04
>>
>> #define OVERCOMMIT_GUESS 0
>> #define OVERCOMMIT_ALWAYS 1
>> diff --git a/mm/mremap.c b/mm/mremap.c
>> index cd8a1b1..f18ab36 100644
>> --- a/mm/mremap.c
>> +++ b/mm/mremap.c
>> @@ -516,10 +516,11 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
>> struct vm_userfaultfd_ctx uf = NULL_VM_UFFD_CTX;
>> LIST_HEAD(uf_unmap);
>>
>> - if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE))
>> + if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE | MREMAP_MIRROR))
>> return ret;
>>
>> - if (flags & MREMAP_FIXED && !(flags & MREMAP_MAYMOVE))
>> + if ((flags & MREMAP_FIXED || flags & MREMAP_MIRROR) &&
>> + !(flags & MREMAP_MAYMOVE))
>> return ret;
>>
>> if (offset_in_page(addr))
>> @@ -528,14 +529,22 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
>> old_len = PAGE_ALIGN(old_len);
>> new_len = PAGE_ALIGN(new_len);
>>
>> - /*
>> - * We allow a zero old-len as a special case
>> - * for DOS-emu "duplicate shm area" thing. But
>> - * a zero new-len is nonsensical.
>> - */
>> + /* A zero new-len is nonsensical. */
>> if (!new_len)
>> return ret;
>>
>> + /*
>> + * For backward compatibility, we allow a zero old-len to imply
>> + * mirroring. This was originally a special case for DOS-emu.
>> + */
>> + if (!old_len)
>> + flags |= MREMAP_MIRROR;
>> + else if (flags & MREMAP_MIRROR) {
>> + if (old_len != new_len)
>> + return ret;
>> + old_len = 0;
>> + }
>> +
>> if (down_write_killable(¤t->mm->mmap_sem))
>> return -EINTR;
>>
>> diff --git a/tools/include/uapi/linux/mman.h b/tools/include/uapi/linux/mman.h
>> index 81d8edf..069f7a5 100644
>> --- a/tools/include/uapi/linux/mman.h
>> +++ b/tools/include/uapi/linux/mman.h
>> @@ -3,8 +3,9 @@
>>
>> #include <uapi/asm/mman.h>
>>
>> -#define MREMAP_MAYMOVE 1
>> -#define MREMAP_FIXED 2
>> +#define MREMAP_MAYMOVE 0x01
>> +#define MREMAP_FIXED 0x02
>> +#define MREMAP_MIRROR 0x04
>>
>> #define OVERCOMMIT_GUESS 0
>> #define OVERCOMMIT_ALWAYS 1
>> --
>> 2.7.5
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>
>