Re: [PATCH v3 3/7] mm/swap: Add swp_offset_pfn() to fetch PFN from swap entry
From: Peter Xu
Date: Wed Aug 10 2022 - 09:17:52 EST
On Wed, Aug 10, 2022 at 02:04:32PM +0800, Huang, Ying wrote:
> Peter Xu <peterx@xxxxxxxxxx> writes:
>
> > We've got a bunch of special swap entries that stores PFN inside the swap
> > offset fields. To fetch the PFN, normally the user just calls swp_offset()
> > assuming that'll be the PFN.
> >
> > Add a helper swp_offset_pfn() to fetch the PFN instead, fetching only the
> > max possible length of a PFN on the host, meanwhile doing proper check with
> > MAX_PHYSMEM_BITS to make sure the swap offsets can actually store the PFNs
> > properly always using the BUILD_BUG_ON() in is_pfn_swap_entry().
> >
> > One reason to do so is we never tried to sanitize whether swap offset can
> > really fit for storing PFN. At the meantime, this patch also prepares us
> > with the future possibility to store more information inside the swp offset
> > field, so assuming "swp_offset(entry)" to be the PFN will not stand any
> > more very soon.
> >
> > Replace many of the swp_offset() callers to use swp_offset_pfn() where
> > proper. Note that many of the existing users are not candidates for the
> > replacement, e.g.:
> >
> > (1) When the swap entry is not a pfn swap entry at all, or,
> > (2) when we wanna keep the whole swp_offset but only change the swp type.
> >
> > For the latter, it can happen when fork() triggered on a write-migration
> > swap entry pte, we may want to only change the migration type from
> > write->read but keep the rest, so it's not "fetching PFN" but "changing
> > swap type only". They're left aside so that when there're more information
> > within the swp offset they'll be carried over naturally in those cases.
> >
> > Since at it, dropping hwpoison_entry_to_pfn() because that's exactly what
> > the new swp_offset_pfn() is about.
> >
> > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
>
> The patch itself looks good. But I searched swp_entry() in kernel
> source code, and found that we need to do more.
>
> For example, in pte_to_pagemap_entry()
>
> frame = swp_type(entry) |
> (swp_offset(entry) << MAX_SWAPFILES_SHIFT);
>
> If it's a migration entry, we need
>
> frame = swp_type(entry) |
> (swp_offset_pfn(entry) << MAX_SWAPFILES_SHIFT);
>
> So I think you need to search all swp_offset() calling in the kernel
> source and check whether they need to be changed.
Yeah I actually looked at all of them and explicitly left this one since I
wanted to dump the whole swp entry - even if it's called "show_pfn" it was
actually dumping the whole entries always, e.g., for genuine swap entries I
don't think it's PFN stored in swp offset, so it's nothing about PFN but
swp offset itself, IMHO.
But after a second thought I agree it should be specially handled here,
because the user app could be relying offset to be pfn for migration
entries. The other thing is I'm not sure whether the encoding of pagemap
entries can always fit for both pfn and A/D bits (majorly, PM_PFRAME_MASK)
even if the arch swap pte fits; it needs more math. So unless necessary,
it'll be good to still make the A/D bits internal to kernel too.
Thanks for the careful review, I'll fix that.
--
Peter Xu