Re: [PATCH v8 01/23] mm: Introduce PTE_MARKER swap entry

From: Alistair Popple
Date: Tue Apr 12 2022 - 21:37:39 EST


Peter Xu <peterx@xxxxxxxxxx> writes:

> On Tue, Apr 12, 2022 at 11:07:56AM +1000, Alistair Popple wrote:
>> Hi Peter,
>
> Hi, Alistair,
>
>>
>> I noticed this while reviewing the next patch in the series. I think you need to
>> add CONFIG_PTE_MARKER to the below as well:
>>
>> #if defined(CONFIG_MEMORY_FAILURE) || defined(CONFIG_MIGRATION) || \
>> defined(CONFIG_DEVICE_PRIVATE)
>> static inline int non_swap_entry(swp_entry_t entry)
>> {
>> return swp_type(entry) >= MAX_SWAPFILES;
>> }
>> #else
>> static inline int non_swap_entry(swp_entry_t entry)
>> {
>> return 0;
>> }
>> #endif
>>
>> Otherwise marker entries will be treated as swap entries, which is wrong for
>> example in swapin_walk_pmd_entry() as marker entries are no longer considered
>> pte_none().
>
> Thanks for the comment, that makes sense.
>
> Instead of adding PTE_MARKER into this equation, I'm going backward and
> wondering purely on why we need to bother with non_swap_entry() at all if
> MAX_SWAPFILES is already defined with proper knowledges of all these bits.

I was going to suggest it was to help the compiler optimise the non-swap entry
code away. But I just tested and it makes no difference in .text section size
either way so I think your suggestion is good unless that isn't true for other
architecture/compiler combinations (I only tried gcc-10.2.1 and x86_64).

That's a possibility because the optimisation isn't obvious to me at least.

non_swap_entry() is equivalent to:

(entry.val >> SWP_TYPE_SHIFT) >= MAX_SWAPFILES;
(entry.val >> (BITS_PER_XA_VALUE - MAX_SWAPFILES_SHIFT)) >= (1<<5);
(entry.val >> (BITS_PER_LONG - 1 - 5)) >= (1<<5);
(entry.val >> 58) >= (1<<5);

Where entry.val is a long. So from that alone it's not obvious this could be
optimised away, because nothing there implies entry.val != (1<<63) which would
make the conditional true. But there's a lot of inlining going on in the
creation of swap entries which I didn't trace, so something must end up implying
entry.val < (1<<63).

>
> #define MAX_SWAPFILES \
> ((1 << MAX_SWAPFILES_SHIFT) - SWP_DEVICE_NUM - \
> SWP_MIGRATION_NUM - SWP_HWPOISON_NUM)
>
> So, I agree with your analysis, but instead of adding PTE_MARKER, what do
> you think about we dropping that complexity as a whole (possibly with a
> standalone patch)?
>
> ---8<---
> diff --git a/include/linux/swapops.h b/include/linux/swapops.h
> index d356ab4047f7..5af852b68805 100644
> --- a/include/linux/swapops.h
> +++ b/include/linux/swapops.h
> @@ -387,18 +387,10 @@ static inline void num_poisoned_pages_inc(void)
> }
> #endif
>
> -#if defined(CONFIG_MEMORY_FAILURE) || defined(CONFIG_MIGRATION) || \
> - defined(CONFIG_DEVICE_PRIVATE)
> static inline int non_swap_entry(swp_entry_t entry)
> {
> return swp_type(entry) >= MAX_SWAPFILES;
> }
> -#else
> -static inline int non_swap_entry(swp_entry_t entry)
> -{
> - return 0;
> -}
> -#endif
>
> #endif /* CONFIG_MMU */
> #endif /* _LINUX_SWAPOPS_H */
> ---8<---
>
> Thanks,