Re: [PATCH v3 3/4] mm: don't expose non-hugetlb page to fast gup prematurely

From: Yu Zhao
Date: Wed Sep 25 2019 - 18:27:02 EST


On Wed, Sep 25, 2019 at 10:25:30AM +0200, Peter Zijlstra wrote:
> On Tue, Sep 24, 2019 at 05:24:58PM -0600, Yu Zhao wrote:
> > We don't want to expose a non-hugetlb page to the fast gup running
> > on a remote CPU before all local non-atomic ops on the page flags
> > are visible first.
> >
> > For an anon page that isn't in swap cache, we need to make sure all
> > prior non-atomic ops, especially __SetPageSwapBacked() in
> > page_add_new_anon_rmap(), are ordered before set_pte_at() to prevent
> > the following race:
> >
> > CPU 1 CPU1
> > set_pte_at() get_user_pages_fast()
> > page_add_new_anon_rmap() gup_pte_range()
> > __SetPageSwapBacked() SetPageReferenced()
> >
> > This demonstrates a non-fatal scenario. Though haven't been directly
> > observed, the fatal ones can exist, e.g., PG_lock set by fast gup
> > caller and then overwritten by __SetPageSwapBacked().
> >
> > For an anon page that is already in swap cache or a file page, we
> > don't need smp_wmb() before set_pte_at() because adding to swap or
> > file cach serves as a valid write barrier. Using non-atomic ops
> > thereafter is a bug, obviously.
> >
> > smp_wmb() is added following 11 of total 12 page_add_new_anon_rmap()
> > call sites, with the only exception being
> > do_huge_pmd_wp_page_fallback() because of an existing smp_wmb().
> >
>
> I'm thinking this patch make stuff rather fragile.. Should we instead
> stick the barrier in set_p*d_at() instead? Or rather, make that store a
> store-release?

I prefer it this way too, but I suspected the majority would be
concerned with the performance implications, especially those
looping set_pte_at()s in mm/huge_memory.c.

If we have a consensus that smp_wmb() would be better off wrapped
together with set_p*d_at() in a macro, I'd be glad to make the change.

And it seems to me applying smp_store_release() would have to happen
in each arch, which might be just as fragile.