Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation
From: Will Deacon
Date: Wed Oct 28 2020 - 20:58:23 EST
On Tue, Oct 27, 2020 at 10:38:16AM +0200, Mike Rapoport wrote:
> On Mon, Oct 26, 2020 at 06:05:30PM +0000, Edgecombe, Rick P wrote:
> > On Mon, 2020-10-26 at 11:05 +0200, Mike Rapoport wrote:
> > > On Mon, Oct 26, 2020 at 01:13:52AM +0000, Edgecombe, Rick P wrote:
> > > > On Sun, 2020-10-25 at 12:15 +0200, Mike Rapoport wrote:
> > > > > Indeed, for architectures that define
> > > > > CONFIG_ARCH_HAS_SET_DIRECT_MAP
> > > > > it is
> > > > > possible that __kernel_map_pages() would fail, but since this
> > > > > function is
> > > > > void, the failure will go unnoticed.
> > > >
> > > > Could you elaborate on how this could happen? Do you mean during
> > > > runtime today or if something new was introduced?
> > >
> > > A failure in__kernel_map_pages() may happen today. For instance, on
> > > x86
> > > if the kernel is built with DEBUG_PAGEALLOC.
> > >
> > > __kernel_map_pages(page, 1, 0);
> > >
> > > will need to split, say, 2M page and during the split an allocation
> > > of
> > > page table could fail.
> >
> > On x86 at least, DEBUG_PAGEALLOC expects to never have to break a page
> > on the direct map and even disables locking in cpa because it assumes
> > this. If this is happening somehow anyway then we should probably fix
> > that. Even if it's a debug feature, it will not be as useful if it is
> > causing its own crashes.
> >
> > I'm still wondering if there is something I'm missing here. It seems
> > like you are saying there is a bug in some arch's, so let's add a WARN
> > in cross-arch code to log it as it crashes. A warn and making things
> > clearer seem like good ideas, but if there is a bug we should fix it.
> > The code around the callers still functionally assume re-mapping can't
> > fail.
>
> Oh, I've meant x86 kernel *without* DEBUG_PAGEALLOC, and indeed the call
> that unmaps pages back in safe_copy_page will just reset a 4K page to
> NP because whatever made it NP at the first place already did the split.
>
> Still, on arm64 with DEBUG_PAGEALLOC=n there is a possibility of a race
> between map/unmap dance in __vunmap() and safe_copy_page() that may
> cause access to unmapped memory:
>
> __vunmap()
> vm_remove_mappings()
> set_direct_map_invalid()
> safe_copy_page()
> __kernel_map_pages()
> return
> do_copy_page() -> fault
>
> This is a theoretical bug, but it is still not nice :)
Just to clarify: this patch series fixes this problem, right?
Will