[PATCH] mm: use READ/WRITE_ONCE() for vma->vm_flags on migrate, mprotect

From: Lorenzo Stoakes
Date: Fri Feb 07 2025 - 12:25:25 EST


According to the syzbot report referenced here, it is possible to encounter
a race between mprotect() writing to the vma->vm_flags field and migration
checking whether the VMA is locked.

There is no real problem with timing here per se, only that torn
reads/writes may occur. Therefore, as a proximate fix, ensure both
operations READ_ONCE() and WRITE_ONCE() to avoid this.

This race is possible due to the ability to look up VMAs via the rmap,
which migration does in this case, which takes no mmap or VMA lock and
therefore does not preclude an operation to modify a VMA.

When the final update of VMA flags is performed by mprotect, this will
cause the rmap lock to be taken while the VMA is inserted on split/merge.

However the means by which we perform splits/merges in the kernel is that
we perform the split/merge operation on the VMA, acquiring/releasing locks
as needed, and only then, after having done so, modifying fields.

We should carefully examine and determine whether we can combine the two
operations so as to avoid such races, and whether it might be possible to
otherwise annotate these rmap field accesses.

Reported-by: syzbot+c2e5712cbb14c95d4847@xxxxxxxxxxxxxxxxxxxxxxxxx
Closes: https://lore.kernel.org/all/67a34e60.050a0220.50516.0040.GAE@xxxxxxxxxx/
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
---
mm/migrate.c | 2 +-
mm/mprotect.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index fb19a18892c8..365c6daa8d1b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -328,7 +328,7 @@ static bool remove_migration_pte(struct folio *folio,
folio_add_file_rmap_pte(folio, new, vma);
set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
}
- if (vma->vm_flags & VM_LOCKED)
+ if (READ_ONCE(vma->vm_flags) & VM_LOCKED)
mlock_drain_local();

trace_remove_migration_pte(pvmw.address, pte_val(pte),
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 516b1d847e2c..84cf7689c5eb 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -607,7 +607,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
unsigned long start, unsigned long end, unsigned long newflags)
{
struct mm_struct *mm = vma->vm_mm;
- unsigned long oldflags = vma->vm_flags;
+ unsigned long oldflags = READ_ONCE(vma->vm_flags);
long nrpages = (end - start) >> PAGE_SHIFT;
unsigned int mm_cp_flags = 0;
unsigned long charged = 0;
@@ -627,7 +627,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
* uncommon case, so doesn't need to be very optimized.
*/
if (arch_has_pfn_modify_check() &&
- (vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) &&
+ (oldflags & (VM_PFNMAP|VM_MIXEDMAP)) &&
(newflags & VM_ACCESS_FLAGS) == 0) {
pgprot_t new_pgprot = vm_get_page_prot(newflags);

@@ -676,7 +676,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
* held in write mode.
*/
vma_start_write(vma);
- vm_flags_reset(vma, newflags);
+ vm_flags_reset_once(vma, newflags);
if (vma_wants_manual_pte_write_upgrade(vma))
mm_cp_flags |= MM_CP_TRY_CHANGE_WRITABLE;
vma_set_page_prot(vma);
--
2.48.1