Re: [PATCH v20 00/12] Implement copy offload support

From: Nitesh Shetty
Date: Mon Jun 03 2024 - 07:17:52 EST


On 01/06/24 07:47AM, Christoph Hellwig wrote:
On Mon, May 20, 2024 at 03:50:13PM +0530, Nitesh Shetty wrote:
So copy offload works only for request based storage drivers.

I don't think that is actually true. It just requires a fair amount of
code in a bio based driver to match the bios up.

I'm missing any kind of information on what this patch set as-is
actually helps with. What operations are sped up, for what operations
does it reduce resource usage?

The major benefit of this copy-offload/emulation framework is
observed in fabrics setup, for copy workloads across the network.
The host will send offload command over the network and actual copy
can be achieved using emulation on the target (hence patch 4).
This results in higher performance and lower network consumption,
as compared to read and write travelling across the network.
With this design of copy-offload/emulation we are able to see the
following improvements as compared to userspace read + write on a
NVMeOF TCP setup:

Setup1: Network Speed: 1000Mb/s
Host PC: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
Target PC: AMD Ryzen 9 5900X 12-Core Processor
block size 8k:
Improvement in IO BW from 106 MiB/s to 360 MiB/s
Network utilisation drops from 97% to 6%.
block-size 1M:
Improvement in IO BW from 104 MiB/s to 2677 MiB/s
Network utilisation drops from 92% to 0.66%.

Setup2: Network Speed: 100Gb/s
Server: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 72 cores
(host and target have the same configuration)
block-size 8k:
17.5% improvement in IO BW (794 MiB/s to 933 MiB/s).
Network utilisation drops from 6.75% to 0.16%.

Part of that might be that the included use case of offloading
copy_file_range doesn't seem particularly useful - on any advance
file system that would be done using reflinks anyway.

Instead of coining a new user interface just for copy,
we thought of using existing infra for plumbing.
When this series gets merged, we can add io-uring interface.

Have you considered hooking into dm-kcopyd which would be an
instant win instead? Or into garbage collection in zoned or other
log structured file systems? Those would probably really like
multiple source bios, though.

Our initial few version of the series had dm-kcopyd use case.
We dropped it, to make overall series lightweight and make it
easier to review and test.
When the current series gets merged, we will start adding
more in-kernel users in next phase.

Thank you,
Nitesh Shetty