Re: [PATCH v6 2/6] mm: swap: free_swap_and_cache_nr() as batched free_swap_and_cache()

From: David Hildenbrand
Date: Mon Apr 08 2024 - 06:24:23 EST


On 08.04.24 12:07, Ryan Roberts wrote:
On 08/04/2024 10:43, David Hildenbrand wrote:

+
+/**
+ * swap_pte_batch - detect a PTE batch for a set of contiguous swap entries
+ * @start_ptep: Page table pointer for the first entry.
+ * @max_nr: The maximum number of table entries to consider.
+ * @entry: Swap entry recovered from the first table entry.
+ *
+ * Detect a batch of contiguous swap entries: consecutive (non-present) PTEs
+ * containing swap entries all with consecutive offsets and targeting the same
+ * swap type.
+ *

Likely you should document that any swp pte bits are ignored? ()

Sorry I don't understand this comment. I thought any non-none, non-present PTE
was always considered to contain only a "swap entry" and a swap entry consists
of a "type" and an "offset" only. (and its a special "non-swap" swap entry if
type > SOME_CONSTANT) Are you saying there are additional fields in the PTE that
are not part of the swap entry?


pte_swp_soft_dirty()
pte_swp_clear_exclusive()
pte_swp_uffd_wp()

Are PTE bits used for swp PTE.

Ahh wow. mind blown. Looks like a massive hack... why not store them in the
arch-independent swap entry, rather than have them squat independently in the PTE?

I think that was discussed at some point, but it not only requires quite some churn to change it (all that swp entry code is a mess), these bits are conceptually really per PTE and not something you would want to pass into actual swap handling code that couldn't care less about all of these.

I looked at this when I added SWP exclusive, but accidentally losing the SWP exclusive marker when converting back and forth made me go the PTE route instead.

Then, the available PTE bits are a bit scattered on some architectures, and converting entry<->PTE gets even uglier if we don't want to "lose" available bits.

Probably the whole "unsigned long swp_entry" stuff should be replaced by a proper struct where we could more easily add flags and have the arch code handle the conversion to the PTE just once. The arch-specific swp_entry stuff is another nasty thing IMHO.


OK, my implementation is buggy. I'll re-spin to fix this.



There is also dirty/young for migration entries, but that's not of a concern
here, because we stop for non_swap_entry().

Looks like these are part of the offset field in the arch-independent swap entry
- much cleaner ;-).

Note that it only applies to migration entries, and only when we have some spare bits due to PFN < offset.

--
Cheers,

David / dhildenb