Re: [PATCH 0/7] Accelerate page migration with batch copying and hardware offload
From: David Hildenbrand (Arm)
Date: Mon May 11 2026 - 12:20:05 EST
On 4/28/26 17:50, Shivank Garg wrote:
> This is the fifth RFC of the patchset to enhance page migration by
Ah, this is an RFC ...
... I suggest b4 for patch series management :P
That also explains why patch #7 is still in there.
> batching folio-copy operations and enabling acceleration via DMA offload.
>
> Single-threaded, folio-by-folio copying bottlenecks page migration in
> modern systems with deep memory hierarchies, especially for large folios
> where copy overhead dominates, leaving significant hardware potential
> untapped.
>
> By batching the copy phase, we create an opportunity for hardware
> acceleration. This series builds the framework and provides a DMA
> offload driver (dcbm) as a reference implementation, targeting bulk
> migration workloads where offloading the copy improves throughput
> and latency while freeing the CPU cycles.
>
> See the RFC V3 cover letter [2] for motivation.
>
> Changelog since V4:
> -------------------
>
> 1. Renamed PAGE_* migration state flags to FOLIO_*. (David)
> 2. Use the new folio->migrate_info field instead of folio->private
> for migration state. (David)
> 3. Fold folios_mc_copy patch in batch-copy implementation patch. (David)
> 3. Renamed migrate_offload_start()/stop() to register()/unregister().
> (Huang, Ying)
> 4. Dropped should_batch() callback from struct migrator. Reason-based
> policy now lives in migrate_pages_batch(). Migrators can still skip
> a batch they don't want (size based policy). (Huang, Ying)
> 5. CONFIG_MIGRATION_COPY_OFFLOAD is now hidden and selected by the
> migrator driver. CONFIG_DCBM_DMA is tristate. (Huang Ying, Gregory Price).
> 6. Wrapped the SRCU + static_call dispatch in a small helper. (Huang, Ying)
> 7. Requir m->owner in migrate_offload_register(), SRCU sync at
> unregister relies on it. Counters are atomic_long_t to avoid lock-order
> issue.
> 9. Moved DCBM sysfs from /sys/kernel/dcbm to /sys/module/dcbm (Huang, Ying)
> 10. Rebased on v7.1-rc1.
>
[...]
>
> OPEN QUESTIONS:
> ---------------
>
> 1. Should the batch path run without a registered migrator? Patches 1-4
> are self-contained and use folios_mc_copy() (CPU). I have several
> options like making batch path always-on for eligible folios, or
> giving admin an option to flip the static branch, or keep the gate.
> I'm leaning toward always-on.
Hiding that detail from migrate.c sounds interesting.
>
> 2. Carrying already_copied via folio->migrate_info vs changing the
> migrate_folio() callback signature (Huang, Ying). I went with the
> field for now to avoid touching every fs callback before the design
> settles. Happy to revisit.
>
> 3. Per-caller offload selection: Today eligibility is by migrate_reason
> only. Some are latency-tolerant, others may be not. Is reason the
> right granularity, or do we want a per-caller hint?
Isn't it sufficient to just do it based on the #folios or sth like that?
If someone migrates a handful of folios, latency is likely more important (and
batching less beneficial).
I'd assume when migrating many folios, batching could just always be done. Or
what's the concern?
>
> 4. Cgroup integration: How should per-cgroup be accounted for different
> migrators (e.g.: any accounting for DMA-busy time)?
Oh. Do we even have to mess with that?
>
> 5. Tuning migrate_pages callers for offloading. For instance, in
> compaction COMPACT_CLUSTER_MAX = 32 caps DMA's payoff for compaction
> (V4 experiment).
Is that HW dependent?
>
> 6. Where do batch-size thresholds live, and how are they tuned? Per
> Huang Ying's split, that policy lives in the migrator. DCBM has no
> threshold today. Open whether it should later be a per-migrator
> sysfs knob or hard-coded; probably clearer once a second migrator
> (SDXI, mtcopy) shows the trade-off.
Again, sounds like being HW dependent, no?
--
Cheers,
David