Re: [PATCH 3/7] mm/migrate: skip data copy for already-copied folios

From: Garg, Shivank

Date: Mon Jun 08 2026 - 15:37:23 EST

On 6/8/2026 9:13 PM, David Hildenbrand (Arm) wrote:
> On 6/8/26 17:41, Zi Yan wrote:
>> On 8 Jun 2026, at 11:18, David Hildenbrand (Arm) wrote:
>>
>>>> Sashiko Comment:
>>>> Does expanding FOLIO_OLD_STATES to include BIT(2) silently corrupt anon_vma
>>>> pointers on 32-bit architectures?
>>>> The anon_vma pointer is packed with FOLIO_OLD_STATES inside dst->migrate_info.
>>>> To separate them, __migrate_folio_extract() uses info & ~FOLIO_OLD_STATES.
>>>> Since anon_vma objects are allocated from a kmem_cache with align=0,
>>>> ARCH_SLAB_MINALIGN defaults to 4 bytes on many 32-bit architectures. This
>>>> means a valid anon_vma pointer can legitimately end in 0x4 or 0xC (meaning
>>>> bit 2 is set).
>>>> When __migrate_folio_extract() masks the pointer with ~7, it will silently
>>>> clear bit 2 from the anon_vma pointer. Any subsequent call to put_anon_vma()
>>>> with this corrupted pointer could cause a use-after-free or a kernel panic.
>>>> --
>>>>
>>>> #define ARCH_SLAB_MINALIGN __alignof__(unsigned long long)
>>>>
>>>> I initially assumed this to be always 8, confusing it to be same as size of
>>>> unsigned long long.
>>>> But the GCC docs note that alignment can be smaller in size:
>>>>
>>>> https://gcc.gnu.org/onlinedocs/gcc/Alignment.html
>>>> "For example, if the target machine requires a double value to be aligned on
>>>> an 8-byte boundary, then __alignof__ (double) is 8. This is true on many RISC
>>>> machines. On more traditional machine designs, __alignof__ (double) is 4 or
>>>> even 2."
>>>>
>>>> If my understanding is right, Sashiko concern is valid, and I can't safely
>>>> use BIT(2).
>>>
>>> 32bit makes this tricky indeed. And that's also the reason why
>>> FOLIO_MAPPING_FLAGS is currently limited to 2 bits.
>>>
>>>> I see few option from here. Either I can gate batch copy for CONFIG_64BIT,
>>>
>>> That's a bit nasty as we'll have to special case 32bit vs 64bit.
>>
>> IIRC, multithreaded copy is already gated by CONFIG_HIGHMEM, otherwise
>> it needs to perform kmap_local() at each copying CPU, which complicates
>> the process. Then, this code will only used for 32bit without highmem,
>> I assume there will no page copy DMA on 32bit platform. Maybe it is not
>> too bad to limit this to 64bit.

I agree.

>
> I'm more concerned of CONFIG_64BIT handling in the code, but if that can be
> avoided easily, fine with me.

MIGRATION_COPY_OFFLOAD will not be enabled for !64BIT at config.
config MIGRATION_COPY_OFFLOAD
bool "Page migration copy offload"
depends on MIGRATION && 64BIT

I think only place in code we need #ifdef is:
enum {
FOLIO_WAS_MAPPED = BIT(0),
FOLIO_WAS_MLOCKED = BIT(1),
FOLIO_OLD_STATES = FOLIO_WAS_MAPPED | FOLIO_WAS_MLOCKED,
#ifdef CONFIG_MIGRATION_COPY_OFFLOAD
FOLIO_CONTENT_COPIED = BIT(2),
#else
FOLIO_CONTENT_COPIED = 0,
#endif
};

With FOLIO_CONTENT_COPIED=0, this will be no operation for its bit operation.

eg:
const bool already_copied = dst->migrate_info & FOLIO_CONTENT_COPIED;
evaluates 0 for !64BIT.

Git tree (Work-in-progress):
https://github.com/AMDESE/linux-mm/commits/shivank/batch-migrate-offload-v6-wip: 4cd9324a

On 6/8/2026 8:48 PM, David Hildenbrand (Arm) wrote:

>> align arg. Or I can change the migrate_folio() callback to pass already_copied
>> info to change to dst->migrate_info enum.
>
> Can you elaborate how that would look like?

migrate_folios_move(.., already_copied) -> migrate_folio_move(.., already_copied) ->
-> move_to_new_folio(dst, src, mode, already_copied) -> a_ops->migrate_folio(mapping, dst, src, mode, already_copied)
= migrate_folio / filemap_migrate_folio / buffer_migrate_folio (many other sites in fs to change)

Thanks,
Shivank