Re: [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages
From: Jerome Glisse
Date: Tue Sep 04 2018 - 10:00:44 EST
On Mon, Sep 03, 2018 at 07:56:54AM +0200, Michal Hocko wrote:
> On Thu 30-08-18 14:39:44, Jerome Glisse wrote:
> > On Thu, Aug 30, 2018 at 11:05:16AM -0700, Mike Kravetz wrote:
> > > On 08/30/2018 09:57 AM, Jerome Glisse wrote:
> > > > On Thu, Aug 30, 2018 at 06:19:52PM +0200, Michal Hocko wrote:
> > > >> On Thu 30-08-18 10:08:25, Jerome Glisse wrote:
> > > >>> On Thu, Aug 30, 2018 at 12:56:16PM +0200, Michal Hocko wrote:
> > > >>>> On Wed 29-08-18 17:11:07, Jerome Glisse wrote:
> > > >>>>> On Wed, Aug 29, 2018 at 08:39:06PM +0200, Michal Hocko wrote:
> > > >>>>>> On Wed 29-08-18 14:14:25, Jerome Glisse wrote:
> > > >>>>>>> On Wed, Aug 29, 2018 at 10:24:44AM -0700, Mike Kravetz wrote:
> > > >>>>>> [...]
> > > >>>>>>>> What would be the best mmu notifier interface to use where there are no
> > > >>>>>>>> start/end calls?
> > > >>>>>>>> Or, is the best solution to add the start/end calls as is done in later
> > > >>>>>>>> versions of the code? If that is the suggestion, has there been any change
> > > >>>>>>>> in invalidate start/end semantics that we should take into account?
> > > >>>>>>>
> > > >>>>>>> start/end would be the one to add, 4.4 seems broken in respect to THP
> > > >>>>>>> and mmu notification. Another solution is to fix user of mmu notifier,
> > > >>>>>>> they were only a handful back then. For instance properly adjust the
> > > >>>>>>> address to match first address covered by pmd or pud and passing down
> > > >>>>>>> correct page size to mmu_notifier_invalidate_page() would allow to fix
> > > >>>>>>> this easily.
> > > >>>>>>>
> > > >>>>>>> This is ok because user of try_to_unmap_one() replace the pte/pmd/pud
> > > >>>>>>> with an invalid one (either poison, migration or swap) inside the
> > > >>>>>>> function. So anyone racing would synchronize on those special entry
> > > >>>>>>> hence why it is fine to delay mmu_notifier_invalidate_page() to after
> > > >>>>>>> dropping the page table lock.
> > > >>>>>>>
> > > >>>>>>> Adding start/end might the solution with less code churn as you would
> > > >>>>>>> only need to change try_to_unmap_one().
> > > >>>>>>
> > > >>>>>> What about dependencies? 369ea8242c0fb sounds like it needs work for all
> > > >>>>>> notifiers need to be updated as well.
> > > >>>>>
> > > >>>>> This commit remove mmu_notifier_invalidate_page() hence why everything
> > > >>>>> need to be updated. But in 4.4 you can get away with just adding start/
> > > >>>>> end and keep around mmu_notifier_invalidate_page() to minimize disruption.
> > > >>>>
> > > >>>> OK, this is really interesting. I was really worried to change the
> > > >>>> semantic of the mmu notifiers in stable kernels because this is really
> > > >>>> a hard to review change and high risk for anybody running those old
> > > >>>> kernels. If we can keep the mmu_notifier_invalidate_page and wrap them
> > > >>>> into the range scope API then this sounds like the best way forward.
> > > >>>>
> > > >>>> So just to make sure we are at the same page. Does this sounds goo for
> > > >>>> stable 4.4. backport? Mike's hugetlb pmd shared fixup can be applied on
> > > >>>> top. What do you think?
> > > >>>
> > > >>> You need to invalidate outside page table lock so before the call to
> > > >>> page_check_address(). For instance like below patch, which also only
> > > >>> do the range invalidation for huge page which would avoid too much of
> > > >>> a behavior change for user of mmu notifier.
> > > >>
> > > >> Right. I would rather not make this PageHuge special though. So the
> > > >> fixed version should be.
> > > >
> > > > Why not testing for huge ? Only huge is broken and thus only that
> > > > need the extra range invalidation. Doing the double invalidation
> > > > for single page is bit overkill.
> > >
> > > I am a bit confused, and hope this does not add to any confusion by others.
> > >
> > > IIUC, the patch below does not attempt to 'fix' anything. It is simply
> > > there to add the start/end notifiers to the v4.4 version of this routine
> > > so that a subsequent patch can use them (with modified ranges) to handle
> > > unmapping a shared pmd huge page. That is the mainline fix which started
> > > this thread.
> > >
> > > Since we are only/mostly interested in fixing the shared pmd issue in
> > > 4.4, how about just adding the start/end notifiers to the very specific
> > > case where pmd sharing is possible?
> > >
> > > I can see the value in trying to back port dependent patches such as this
> > > so that stable releases look more like mainline. However, I am not sure of
> > > the value in this case as this patch was part of a larger set changing
> > > notifier semantics.
> >
> > For all intents and purposes this is not a backport of the original
> > patch so maybe we should just drop the commit reference and just
> > explains that it is there to fix mmu notifier in respect to huge page
> > migration.
> >
> > The original patches fix more than this case because newer featurers
> > like THP migration, THP swapping, ... added more cases where things
> > would have been wrong. But in 4.4 frame there is only huge tlb fs
> > migration.
>
> And THP migration is still a problem with 4.4 AFAICS. All other cases
> simply split the huge page but THP migration keeps it in one piece and
> as such it is theoretically broken as you have explained. So I would
> stick with what I posted with some more clarifications in the changelog
> if you think it is appropriate (suggestions welcome).
Reading code there is no THP migration in 4.4 only huge tlb migration.
Look at handle_mm_fault which do not know how to handle swap pmd, only
the huge tlb fs fault handler knows how to handle those. Hence why i
was checking for huge tlb exactly as page_check_address() to only range
invalidate for huge tlb fs migration.
But i am fine with doing the range invalidation with all.
Cheers,
Jérôme