Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation
From: Mike Rapoport
Date: Thu Oct 29 2020 - 04:12:50 EST
On Wed, Oct 28, 2020 at 09:03:31PM +0000, Edgecombe, Rick P wrote:
> > On Wed, Oct 28, 2020 at 11:20:12AM +0000, Will Deacon wrote:
> > > On Tue, Oct 27, 2020 at 10:38:16AM +0200, Mike Rapoport wrote:
> > > >
> > > > This is a theoretical bug, but it is still not nice :)
> > > >
> > >
> > > Just to clarify: this patch series fixes this problem, right?
> >
> > Yes.
> >
>
> Well, now I'm confused again.
>
> As David pointed, __vunmap() should not be executing simultaneously
> with the hibernate operation because hibernate can't snapshot while
> data it needs to save is still updating. If a thread was paused when a
> page was in an "invalid" state, it should be remapped by hibernate
> before the copy.
>
> To level set, before reading this mail, my takeaways from the
> discussions on potential hibernate/debug page alloc problems were:
>
> Potential RISC-V issue:
> Doesn't have hibernate support
>
> Potential ARM issue:
> The logic around when it's cpa determines pages might be unmapped looks
> correct for current callers.
>
> Potential x86 page break issue:
> Seems to be ok for now, but a new set_memory_np() caller could violate
> assumptions in hibernate.
>
> Non-obvious thorny logic:
> General agreement it would be good to separate dependencies.
>
> Behavior of V1 of this patchset:
> No functional change other than addition of a warn in hibernate.
There is a change that adds explicit use of set_direct_map() to
hibernate. Currently, in case of arm64 with DEBUG_PAGEALLOC=n if a
thread was paused when a page was in an "invalid" state hibernate will
access an unmapped data because __kernel_map_pages() will bail out.
After the change set_direct_map_default_noflush() would be used and the
page will get mapped before copy.
> So "does this fix the problem", "yes" leaves me a bit confused... Not
> saying there couldn't be any problems, especially due to the thorniness
> and cross arch stride, but what is it exactly and how does this series
> fix it?
This series goal was primarily to separate dependincies and make it
clearer what DEBUG_PAGEALLOC and what SET_DIRECT_MAP are. As it turned
out, there is also some lack of consistency between architectures that
implement either of this so I tried to improve this as well.
Honestly, I don't know if a thread can be paused at the time __vunmap()
left invalid pages, but it could, there is an issue on arm64 with
DEBUG_PAGEALLOC=n and this set fixes it.
__vunmap()
vm_remove_mappings()
set_direct_map_invalid()
/* thread is frozen */
safe_copy_page()
__kernel_map_pages()
if (!debug_pagealloc())
return
do_copy_page() -> fault
--
Sincerely yours,
Mike.