Re: [PATCH v1 00/15] mm: COW fixes part 2: reliable GUP pins of anonymous pages

From: Oded Gabbay
Date: Thu Mar 10 2022 - 06:14:22 EST


On Wed, Mar 9, 2022 at 10:00 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 08.03.22 22:22, Linus Torvalds wrote:
> > On Tue, Mar 8, 2022 at 6:14 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
> >>
> >> This series fixes memory corruptions when a GUP pin (FOLL_PIN) was taken
> >> on an anonymous page and COW logic fails to detect exclusivity of the page
> >> to then replacing the anonymous page by a copy in the page table [...]
> >
> > From a cursory scan of the patches, this looks sane.
>
> Thanks for skimming over the patches that quickly!
>
> >
> > I'm not sure what the next step should be, but I really would like the
> > people who do a lot of pinning stuff to give it a good shake-down.
> > Including both looking at the patches, but very much actually running
> > it on whatever test-cases etc you people have.
> >
> > Please?

I can take this patch-set and test it in our data-center with all the
DL workloads we are running
on Gaudi.

David,
Any chance you can prepare me a branch with your patch-set based on 5.17-rc7 ?
I prefer to take a stable kernel and not 5.18-rc1 as this is going to
run on hundreds of machines.

Thanks,
Oded

>
> My proposal would be to pull it into -next early after we have
> v5.18-rc1. I expect some minor clashes with folio changes that should go
> in in the next merge window, so I'll have to rebase+resend either way,
> and I'm planning on thoroughly testing at least on s390x as well.
>
> We'd then have plenty of time to further review+test while in -next
> until the v5.19 merge window opens up.
>
> By that time I should also have my selftests cleaned up and ready, and
> part 3 ready to improve the situation for FOLL_GET|FOLL_WRITE until we
> have the full FOLL_GET->FOLL_PIN conversion from John (I'll most
> probably sent out an early RFC of part 3 soonish). So we *might* be able
> to have everything fixed in v5.19.
>
> Last but not least, tools/cgroup/memcg_slabinfo.py as mentioned in patch
> #10 still needs care due to the PG_slab reuse, but I consider that a
> secondary concern (yet, it should be fixed and help from the Authors
> would be appreciated ;) ).
>
> --
> Thanks,
>
> David / dhildenb
>