Re: [PATCH V3] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

From: Bernard Metzler
Date: Thu Jun 24 2021 - 11:46:17 EST



-----ira.weiny@xxxxxxxxx wrote: -----

>To: "Jason Gunthorpe" <jgg@xxxxxxxx>
>From: ira.weiny@xxxxxxxxx
>Date: 06/24/2021 12:16AM
>Cc: "Ira Weiny" <ira.weiny@xxxxxxxxx>, "Mike Marciniszyn"
><mike.marciniszyn@xxxxxxxxxxxxxxxxxxxx>, "Dennis Dalessandro"
><dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx>, "Doug Ledford"
><dledford@xxxxxxxxxx>, "Faisal Latif" <faisal.latif@xxxxxxxxx>,
>"Shiraz Saleem" <shiraz.saleem@xxxxxxxxx>, "Bernard Metzler"
><bmt@xxxxxxxxxxxxxx>, "Kamal Heib" <kheib@xxxxxxxxxx>,
>linux-rdma@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
>Subject: [EXTERNAL] [PATCH V3] RDMA/siw: Convert siw_tx_hdt() to
>kmap_local_page()
>
>From: Ira Weiny <ira.weiny@xxxxxxxxx>
>
>kmap() is being deprecated and will break uses of device dax after
>PKS
>protection is introduced.[1]
>
>The use of kmap() in siw_tx_hdt() is all thread local therefore
>kmap_local_page() is a sufficient replacement and will work with
>pgmap
>protected pages when those are implemented.
>
>siw_tx_hdt() tracks pages used in a page_array. It uses that array
>to
>unmap pages which were mapped on function exit. Not all entries in
>the
>array are mapped and this is tracked in kmap_mask.
>
>kunmap_local() takes a mapped address rather than a page. Alter
>siw_unmap_pages() to take the iov array to reuse the iov_base address
>of
>each mapping. Use PAGE_MASK to get the proper address for
>kunmap_local().
>
>kmap_local_page() mappings are tracked in a stack and must be
>unmapped
>in the opposite order they were mapped in. Because segments are
>mapped
>into the page array in increasing index order, modify
>siw_unmap_pages()
>to unmap pages in decreasing order.
>
>Use kmap_local_page() instead of kmap() to map pages in the
>page_array.
>
>[1]
>INVALID URI REMOVED
>lkml_20201009195033.3208459-2D59-2Dira.weiny-40intel.com_&d=DwIDAg&c=
>jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8ZO1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&
>m=eI4Db7iSlEKRl4l5pYKwY5rL5WXWWxahhxNciwy2lRA&s=vo11VhOvYbAkABdhV6htX
>TmXgFZeWbBZAFnPDvg7Bzs&e=
>
>Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
>
>---
>Jason, I went ahead and left this a separate patch. Let me know if
>you really
>want this and the other siw squashed.
>
>Changes for V3:
> From Bernard
> Use 'p' in kmap_local_page()
> Use seg as length to siw_unmap_pages()
>
>Changes for V2:
> From Bernard
> Reuse iov[].iov_base rather than declaring another array
> of pointers and preserve the use of kmap_mask to know
> which iov's were kmapped.
>---
> drivers/infiniband/sw/siw/siw_qp_tx.c | 30
>+++++++++++++++++----------
> 1 file changed, 19 insertions(+), 11 deletions(-)
>
>diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c
>b/drivers/infiniband/sw/siw/siw_qp_tx.c
>index db68a10d12cd..89a5b75f7254 100644
>--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
>+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
>@@ -396,13 +396,20 @@ static int siw_0copy_tx(struct socket *s,
>struct page **page,
>
> #define MAX_TRAILER (MPA_CRC_SIZE + 4)
>
>-static void siw_unmap_pages(struct page **pp, unsigned long
>kmap_mask)
>+static void siw_unmap_pages(struct kvec *iov, unsigned long
>kmap_mask, int len)
> {
>- while (kmap_mask) {
>- if (kmap_mask & BIT(0))
>- kunmap(*pp);
>- pp++;
>- kmap_mask >>= 1;
>+ int i;
>+
>+ /*
>+ * Work backwards through the array to honor the kmap_local_page()
>+ * ordering requirements.
>+ */
>+ for (i = (len-1); i >= 0; i--) {
>+ if (kmap_mask & BIT(i)) {
>+ unsigned long addr = (unsigned long)iov[i].iov_base;
>+
>+ kunmap_local((void *)(addr & PAGE_MASK));
>+ }
> }
> }
>
>@@ -498,7 +505,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> p = siw_get_upage(mem->umem,
> sge->laddr + sge_off);
> if (unlikely(!p)) {
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(iov, kmap_mask, seg);
> wqe->processed -= c_tx->bytes_unsent;
> rv = -EFAULT;
> goto done_crc;
>@@ -506,11 +513,12 @@ static int siw_tx_hdt(struct siw_iwarp_tx
>*c_tx, struct socket *s)
> page_array[seg] = p;
>
> if (!c_tx->use_sendpage) {
>- iov[seg].iov_base = kmap(p) + fp_off;
>- iov[seg].iov_len = plen;
>+ void *kaddr = kmap_local_page(p);
>
> /* Remember for later kunmap() */
> kmap_mask |= BIT(seg);
>+ iov[seg].iov_base = kaddr + fp_off;
>+ iov[seg].iov_len = plen;
>
> if (do_crc)
> crypto_shash_update(
>@@ -542,7 +550,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
>
> if (++seg > (int)MAX_ARRAY) {
> siw_dbg_qp(tx_qp(c_tx), "to many fragments\n");
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(iov, kmap_mask, seg-1);
> wqe->processed -= c_tx->bytes_unsent;
> rv = -EMSGSIZE;
> goto done_crc;
>@@ -593,7 +601,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> } else {
> rv = kernel_sendmsg(s, &msg, iov, seg + 1,
> hdr_len + data_len + trl_len);
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(iov, kmap_mask, seg+1);

seg+1 is one to many, since the last segment references the iWarp
trailer (CRC). There are 2 reason for this multi-segment processing
in the transmit path. (1) efficiency and (2) MTU based packet framing.
The iov contains the complete iWarp frame with header, (potentially
multiple) data fragments, and the CRC. It gets pushed to TCP in one
go, praying for iWarp framing stays intact (which most time works).
So the code can collect data form multiple SGE's of a WRITE or
SEND and tries putting those into one frame, if MTU allows, and
adds header and trailer.
The last segment (seg + 1) references the CRC, which is never kmap'ed.

I'll try the code next days, but it looks good otherwise!

Thanks very much!
> }
> if (rv < (int)hdr_len) {
> /* Not even complete hdr pushed or negative rv */
>--
>2.28.0.rc0.12.gb6a658bd00c9
>
>