Re: [RFC PATCH v2 30/48] siw: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage to transmit

From: David Howells
Date: Wed Mar 29 2023 - 11:34:37 EST


Bernard Metzler <BMT@xxxxxxxxxxxxxx> wrote:

> > When transmitting data, call down into TCP using a single sendmsg with
> > MSG_SPLICE_PAGES to indicate that content should be spliced rather than
> > performing several sendmsg and sendpage calls to transmit header, data
> > pages and trailer.
> >
> > To make this work, the data is assembled in a bio_vec array and attached to
> > a BVEC-type iterator. The header and trailer (if present) are copied into
> > page fragments that can be freed with put_page().
>
> I like it a lot if it still keeps zero copy sendpage() semantics for
> the cases the driver can make use of data transfers w/o copy.
> Is 'msg.msg_flags |= MSG_SPLICE_PAGES' doing that magic?

Yes. MSG_SPLICE_PAGES indicates that you want the socket to retain your
buffer and pass it directly to the device. Note that it's just a hint,
however, pages that are unspliceable (eg. they belong to the slab) will get
copied into a page fragment instead. Further, if the device cannot support a
vector, then the hint can be ignored and all the data can be copied as normal.

> 'splicing' suggest just merging pages to me.

'splicing' as in what the splice system call does.

Unfortunately, MSG_ZEROCOPY is already a (different) thing.

> It would simplify the transmit code path substantially, also getting
> rid of kmap_local_page()/kunmap_local() sequences for multi-fragment
> sendmsg()'s.

If the ITER_ITERLIST iterator is accepted, then siw would be able to do mix
KVEC and BVEC iterators, e.g. what I did for sunrpc here:

https://lore.kernel.org/linux-fsdevel/20230329141354.516864-42-dhowells@xxxxxxxxxx/T/#u

This means that in siw_tx_hdt() where I made it copy data into page fragments
using page_frag_memdup() and attach that to a bvec:

hdr_len = c_tx->ctrl_len - c_tx->ctrl_sent;
h = page_frag_memdup(NULL, hdr, hdr_len, GFP_NOFS, ULONG_MAX);
if (!h)
goto done;
bvec_set_virt(&bvec[0], h, hdr_len);
seg = 1;

it can just set up a kvec instead.

Unfortunately, it's not so easy to get rid of all of the kmap'ing as we need
to do some of it to do the hashing.

David