Re: [PATCH v2 1/1] s390/mm: Fix handling of _PAGE_UNUSED pte bit

From: Heiko Carstens

Date: Mon Jun 15 2026 - 05:45:46 EST


On Mon, Jun 15, 2026 at 11:17:41AM +0200, Claudio Imbrenda wrote:
> The _PAGE_UNUSED softbit should not really be lying around. Its sole
> purpose is to signal to try_to_unmap_one() and try_to_migrate_one()
> that the page can be discarded instead of being moved / swapped.
>
> KVM has no way to know why a page is being unmapped, so it sets the bit
> on userspace ptes corresponding to unused guest pages every time they
> get unmapped. KVM has no reasonable way to clear the bit once the page
> is in use again.
>
> Without appropriate cleanup, the _PAGE_UNUSED bit will linger around
> and cause guest corruption when a used page is instead thrown out.
>
> While set_ptes() checks and clears the bit, ptep_xchg_direct(),
> ptep_xchg_lazy(), and ptep_modify_prot_commit() did not. This led to
> used pages being thrown out as if they were unused, causing guest
> corruption.
>
> This patch fixes the issue by introducing the missing checks in the
> above functions.
>
> Also fix gmap_helper_try_set_pte_unused() to only set the bit if the
> pte is present; the _PAGE_UNUSED bit is only defined for present ptes
> and thus should not be set for non-present ptes.
>
> Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx>
> Fixes: c98175b7917f ("KVM: s390: Add gmap_helper_set_unused()")
> ---
> arch/s390/mm/gmap_helpers.c | 4 ++--
> arch/s390/mm/pgtable.c | 6 ++++++
> 2 files changed, 8 insertions(+), 2 deletions(-)

...

> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 4acd8b140c4b..2acc79383e7d 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -122,6 +122,8 @@ pte_t ptep_xchg_direct(struct mm_struct *mm, unsigned long addr,
>
> preempt_disable();
> old = ptep_flush_direct(mm, addr, ptep, 1);
> + if (pte_present(new))
> + new = clear_pte_bit(new, __pgprot(_PAGE_UNUSED));
> set_pte(ptep, new);
> preempt_enable();
> return old;
> @@ -160,6 +162,8 @@ pte_t ptep_xchg_lazy(struct mm_struct *mm, unsigned long addr,
>
> preempt_disable();
> old = ptep_flush_lazy(mm, addr, ptep, 1);
> + if (pte_present(new))
> + new = clear_pte_bit(new, __pgprot(_PAGE_UNUSED));
> set_pte(ptep, new);
> preempt_enable();
> return old;
> @@ -175,6 +179,8 @@ pte_t ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr,
> void ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned long addr,
> pte_t *ptep, pte_t old_pte, pte_t pte)
> {
> + if (pte_present(pte))
> + pte = clear_pte_bit(pte, __pgprot(_PAGE_UNUSED));
> set_pte(ptep, pte);

Can't we move the logic from set_ptes() to set_pte() instead? The above
approach remembers me of the open-coded removal of the no-exec bit at many
places we had, which became a maintenance mess until it was rewritten.

The compiler _might_ even be clever enough to move the removal of the bit
outside the loop within set_ptes().