Re: [GIT PULL] arm64 updates for 6.13-rc1

From: Catalin Marinas
Date: Wed Dec 04 2024 - 10:29:34 EST


On Mon, Dec 02, 2024 at 08:22:57AM -0800, Yang Shi wrote:
> On 11/28/24 1:56 AM, David Hildenbrand wrote:
> > On 28.11.24 02:21, Yang Shi wrote:
> > > > > diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
> > > > > index 87b3f1a25535..ef303a2262c5 100644
> > > > > --- a/arch/arm64/mm/copypage.c
> > > > > +++ b/arch/arm64/mm/copypage.c
> > > > > @@ -30,9 +30,9 @@ void copy_highpage(struct page *to, struct
> > > > > page *from)
> > > > >         if (!system_supports_mte())
> > > > >             return;
> > > > > -    if (folio_test_hugetlb(src) &&
> > > > > -        folio_test_hugetlb_mte_tagged(src)) {
> > > > > -        if (!folio_try_hugetlb_mte_tagging(dst))
> > > > > +    if (folio_test_hugetlb(src)) {
> > > > > +        if (!folio_test_hugetlb_mte_tagged(src) ||
> > > > > +            !folio_try_hugetlb_mte_tagging(dst))
> > > > >                 return;
> > > > >             /*
> > > > I wonder why we had a 'return' here originally rather than a
> > > > WARN_ON_ONCE() as we do further down for the page case. Do you seen any
> > > > issue with the hunk below? Destination should be a new folio and not
> > > > tagged yet:
> > >
> > > Yes, I did see problem. Because we copy tags for all sub pages then set
> > > folio mte tagged when copying the data for the first subpage. The
> > > warning will be triggered when we copy the second subpage.
> >
> > It's rather weird, though. We're instructed to copy a single page, yet
> > copy tags for all pages.
> >
> > This really only makes sense when called from folio_copy(), where we are
> > guaranteed to copy all pages.
> >
> > I'm starting to wonder if we should be able to hook into / overload
> > folio_copy() instead, to just handle the complete hugetlb copy ourselves
> > in one shot, and assume that copy_highpage() will never be called for
> > hugetlb pages (WARN and don't copy tags).
>
> Actually folio_copy() is just called by migration. Copy huge page in CoW is
> more complicated and uses copy_user_highpage()->copy_highpage() instead of
> folio_copy(). It may start the page copy from any subpage. For example, if
> the CoW is triggered by accessing to the address in the middle of 2M. Kernel
> may copy the second half first then the first half to guarantee the accessed
> data in cache.

Still trying to understand the possible call paths here. If we get a
write fault on a large folio, does the core code allocate a folio of the
same size for CoW or it starts with smaller ones? wp_page_copy()
allocates order 0 AFAICT, though if it was a pmd fault, it takes a
different path in handle_mm_fault(). But we also have huge pages using
contiguous ptes.

Unless the source and destinations folios are exactly the same size, it
will break many assumptions in the code above. Going the other way
around is also wrong, dst larger than src, we are not initialising the
whole dst folio.

Maybe going back to per-page PG_mte_tagged flag rather than per-folio
would keep things simple, less risk of wrong assumptions.

--
Catalin