Re: [PATCH] mm: don't expose page to fast gup before it's ready
From: Yu Zhao
Date: Tue May 14 2019 - 19:09:42 EST
On Tue, May 14, 2019 at 02:25:27PM -0700, Andrew Morton wrote:
> On Tue, 9 Jan 2018 02:10:50 -0800 Yu Zhao <yuzhao@xxxxxxxxxx> wrote:
>
> > > Also what prevents reordering here? There do not seem to be any barriers
> > > to prevent __SetPageSwapBacked leak after set_pte_at with your patch.
> >
> > I assumed mem_cgroup_commit_charge() acted as full barrier. Since you
> > explicitly asked the question, I realized my assumption doesn't hold
> > when memcg is disabled. So we do need something to prevent reordering
> > in my patch. And it brings up the question whether we want to add more
> > barrier to other places that call page_add_new_anon_rmap() and
> > set_pte_at().
>
> Is a new version of this patch planned?
Sorry for the late reply. The last time I tried, I didn't come up
with a better fix because:
1) as Michal pointed out, we need to make sure the fast gup sees
all changes made before set_pte_at();
2) pairing smp_wmb() in set_pte/pmd_at() with smp_rmb() in gup
seems the best way to prevent any potential ordering related
problems in the future;
3) but this slows down the paths that don't require the smp_mwb()
unnecessarily.
I didn't give it further thought because the problem doesn't seem
fatal at the time. Now the fast gup has changed and the problem is
serious:
CPU 1 CPU 1
set_pte_at get_user_pages_fast
page_add_new_anon_rmap gup_pte_range
__SetPageSwapBacked (fetch)
try_get_compound_head
page_ref_add_unless
__SetPageSwapBacked (store)
Or the similar problem could happen to __do_huge_pmd_anonymous_page(),
for the reason of missing smp_wmb() between the non-atomic bit op
and set_pmd_at().
We could simply replace __SetPageSwapBacked() with its atomic
version. But 2) seems more preferable to me because it addresses
my original problem:
> > I didn't observe the race directly. But I did get few crashes when
> > trying to access mem_cgroup of pages returned by get_user_pages_fast().
> > Those page were charged and they showed valid mem_cgroup in kdumps.
> > So this led me to think the problem came from premature set_pte_at().
Thoughts? Thanks.