Re: [RESEND PATCH] x86/mm: only allow memmap=XX!YY over existing RAM

From: Dan Williams
Date: Wed Jun 29 2016 - 10:42:29 EST


On Tue, Jun 28, 2016 at 10:12 PM, Yigal Korman <yigal@xxxxxxxxxxxxx> wrote:
> On Wed, Jun 29, 2016 at 4:09 AM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>>
>> On Tue, Jun 28, 2016 at 10:58 AM, H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>> > On 06/28/16 09:33, Dan Williams wrote:
>> >> On Tue, Jun 28, 2016 at 1:31 AM, Yigal Korman <yigal@xxxxxxxxxxxxx> wrote:
>> >>> Before this patch, passing a range that is beyond the physical memory
>> >>> range will succeed, the user will see a /dev/pmem0 and will be able to
>> >>> access it. Reads will always return 0 and writes will be silently
>> >>> ignored.
>> >>>
>> >>> I've gotten more than one bug report about mkfs.{xfs,ext4} or nvml
>> >>> failing that were eventually tracked down to be wrong values passed to
>> >>> memmap.
>> >>>
>> >>> This patch prevents the above issue by instead of adding a new memory
>> >>> range, only update a RAM memory range with the PRAM type. This way,
>> >>> passing the wrong memmap will either not give you a pmem at all or give
>> >>> you a smaller one that actually has RAM behind it.
>> >>>
>> >>> And if someone still needs to fake a pmem that doesn't have RAM behind
>> >>> it, they can simply do memmap=XX@YY,XX!YY.
>> >>>
>> >>> Signed-off-by: Yigal Korman <yigal@xxxxxxxxxxxxx>
>> >>> Acked-by: Dan Williams <dan.j.williams@xxxxxxxxx>
>> >>> Acked-by: Johannes Thumshirn <jthumshirn@xxxxxxx>
>> >>> Tested-by: Boaz Harrosh <boaz@xxxxxxxxxxxxx>
>> >>> ---
>> >>
>> >> I have some other libnvdimm fixes heading upstream shortly if x86
>> >> folks just want to ack this one...
>> >>
>> >
>> > I'm concerned about this. This would mean that memory not marked as RAM
>> > because it is persistent and you don't want the OS to accidentally
>> > clobber persistent RAM can't be added.
>>
>> Ah true. Specifically you are worried about the case where a
>> platform-firmware has mis-marked pmem as reserved memory (or some
>> other type) and would like to correct it to be pram.
>
>
> As I mentioned in the patch, this is still possible by doing memmap=X@Y,X!Y

Sorry, yes I overlooked this in my response to Peter.

> Also, with fixes in grub and the kernel regarding mis-marking NVDIMMs
> this is much less common today.
> My purpose was simply to prevent a repeating user error for the common use case.

It makes sense, but at the same time it still involves a user
education burden that memmap=X!Y is constrained by default.

>>
>>
>> > So it seems like The Wrong
>> > Thing. If all you want is simulated pram then it shouldn't care about
>> > addresses in the first place and instead we should just specify it by
>> > quantity.
>>
>> Yes, agree we need an explicit "simulate pram" option independent of
>> memmap=, or just continue to educate users that if they try to
>> simulate pmem and specify an invalid range they get to keep all the
>> broken pieces.
>
> I'd love to have a simpler way to specify simulated pram, but quantity
> is not good enough.
> For my use case, for example, I need the quantity to be spread evenly
> over all NUMA nodes, so just getting a range "somewhere" is not good.
> And I can imagine other users that want to pin pram to same socket
> where their high speed NIC sits.
> So I not sure we can find a better general api than memmap and I not
> sure it's worth it.

I think it would be worthwhile to have something like testpmem= which
takes the same parameters as memmap=, but it is constrained to RAM by
default. Then it becomes clear that the user really does want a
simulation configuration on RAM rather than explicitly specifying a
new persistent memory range.