Re: [PATCH v1] x86/mm/pat: fix VM_PAT handling in COW mappings

From: Ingo Molnar
Date: Mon Apr 01 2024 - 05:45:25 EST



* David Hildenbrand <david@xxxxxxxxxx> wrote:

> > > > try the trivial restriction approach first, and only go with your original
> > > > patch if that fails?
> > >
> > > Which version would you prefer, I had two alternatives (excluding comment
> > > changes, white-space expected to be broken).
> > >
> > >
> > > 1) Disallow when we would have set VM_PAT on is_cow_mapping()
> > >
> > > diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> > > index 0d72183b5dd0..6979912b1a5d 100644
> > > --- a/arch/x86/mm/pat/memtype.c
> > > +++ b/arch/x86/mm/pat/memtype.c
> > > @@ -994,6 +994,9 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
> > > && size == (vma->vm_end - vma->vm_start))) {
> > > int ret;
> > > + if (is_cow_mapping(vma->vm_flags))
> > > + return -EINVAL;
> > > +
> > > ret = reserve_pfn_range(paddr, size, prot, 0);
> > > if (ret == 0 && vma)
> > > vm_flags_set(vma, VM_PAT);
> > >
> > >
> > > 2) Fallback to !VM_PAT
> > >
> > > diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> > > index 0d72183b5dd0..8e97156c9be8 100644
> > > --- a/arch/x86/mm/pat/memtype.c
> > > +++ b/arch/x86/mm/pat/memtype.c
> > > @@ -990,8 +990,8 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,
> > > enum page_cache_mode pcm;
> > > /* reserve the whole chunk starting from paddr */
> > > - if (!vma || (addr == vma->vm_start
> > > - && size == (vma->vm_end - vma->vm_start))) {
> > > + if (!vma || (!is_cow_mapping(vma->vm_flags) && addr == vma->vm_start &&
> > > + size == (vma->vm_end - vma->vm_start))) {
> > > int ret;
> > > ret = reserve_pfn_range(paddr, size, prot, 0);
> > >
> > >
> > >
> > > Personally, I'd go for 2).
> >
> > So what's the advantage of #2? This is clearly something the user didn't
> > really intend or think about much. Isn't explicitly failing that mapping a
> > better option than silently downgrading it to !VM_PAT?
> >
> > (If I'm reading it right ...)
>
> I think a simple mmap(MAP_PRIVATE) of /dev/mem will unconditionally fail
> with 1), while it keeps on working for 2).
>
> Note that I think we currently set VM_PAT on each and every system if
> remap_pfn_range() will cover the whole VMA, even if pat is not actually
> enabled.
>
> It's all a bit of a mess TBH, but I got my hands dirty enough on that.
>
> So 1) can be rather destructive ... 2) at least somehow keeps it working.
>
> For that reason I went with the current patch, because it's hard to tell
> which use case you will end up breaking ... :/

Yeah, so I think you make valid observations, i.e. your first patch is
probably the best one.

But since it changes mm/memory.c, I'd like to pass that over to Andrew
and the MM folks.

The x86 bits:

Acked-by: Ingo Molnar <mingo@xxxxxxxxxx>

Thanks,

Ingo