Re: [PATCH 4/4] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

From: Bernard Metzler
Date: Tue Jun 22 2021 - 12:42:56 EST


-----ira.weiny@xxxxxxxxx wrote: -----

>To: "Jason Gunthorpe" <jgg@xxxxxxxx>
>From: ira.weiny@xxxxxxxxx
>Date: 06/22/2021 08:14AM
>Cc: "Ira Weiny" <ira.weiny@xxxxxxxxx>, "Mike Marciniszyn"
><mike.marciniszyn@xxxxxxxxxxxxxxxxxxxx>, "Dennis Dalessandro"
><dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx>, "Doug Ledford"
><dledford@xxxxxxxxxx>, "Faisal Latif" <faisal.latif@xxxxxxxxx>,
>"Shiraz Saleem" <shiraz.saleem@xxxxxxxxx>, "Bernard Metzler"
><bmt@xxxxxxxxxxxxxx>, "Kamal Heib" <kheib@xxxxxxxxxx>,
>linux-rdma@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
>Subject: [EXTERNAL] [PATCH 4/4] RDMA/siw: Convert siw_tx_hdt() to
>kmap_local_page()
>
>From: Ira Weiny <ira.weiny@xxxxxxxxx>
>
>kmap() is being deprecated and will break uses of device dax after
>PKS
>protection is introduced.[1]
>
>The use of kmap() in siw_tx_hdt() is all thread local therefore
>kmap_local_page() is a sufficient replacement and will work with
>pgmap
>protected pages when those are implemented.
>
>kmap_local_page() mappings are tracked in a stack and must be
>unmapped
>in the opposite order they were mapped in.
>
>siw_tx_hdt() tracks pages used in a page_array. It uses that array
>to
>unmap pages which were mapped on function exit. Not all entries in
>the
>array are mapped and this is tracked in kmap_mask.
>
>kunmap_local() takes a mapped address rather than a page. Declare a
>mapped address array, page_array_addr, of the same size as the page
>array to be used for unmapping.
>

Hi Ira, thanks for taking care of that!

I think we can avoid introducing another 'page_array_addr[]' array
here, which must be zeroed first and completely searched for
valid mappings during unmap, and also further bloats the
stack size of siw_tx_hdt(). I think we can go away with the
already available iov[].iov_base addresses array, masking addresses
with PAGE_MASK during unmapping to mask any first byte offset.
All kmap_local_page() mapping end up at that list. For unmapping
we can still rely on the kmap_mask bit field, which is more
efficient to initialize and search for valid mappings. Ordering
during unmapping can be guaranteed if we parse the bitmask
in reverse order. Let me know if you prefer me to propose
a change -- that siw_tx_hdt() thing became rather complex I
have to admit!

Best,
Bernard.

>Use kmap_local_page() instead of kmap() to map pages in the
>page_array.
>
>Because segments are mapped into the page array in increasing index
>order, modify siw_unmap_pages() to unmap pages in decreasing order.
>
>The kmap_mask is no longer needed as the lack of an address in the
>address array can indicate no unmap is required.
>
>[1]
>INVALID URI REMOVED
>lkml_20201009195033.3208459-2D59-2Dira.weiny-40intel.com_&d=DwIDAg&c=
>jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8ZO1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&
>m=wnRcc-qyXV_X7kyQfFYL6XPgmmakQxmo44BmjIon-w0&s=Y0aiKJ4EHZY8FJlI-uiPr
>xcBE95kmgn3iEz3p8d5VF4&e=
>
>Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
>---
> drivers/infiniband/sw/siw/siw_qp_tx.c | 35
>+++++++++++++++------------
> 1 file changed, 20 insertions(+), 15 deletions(-)
>
>diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c
>b/drivers/infiniband/sw/siw/siw_qp_tx.c
>index db68a10d12cd..e70aba23f6e7 100644
>--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
>+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
>@@ -396,13 +396,17 @@ static int siw_0copy_tx(struct socket *s,
>struct page **page,
>
> #define MAX_TRAILER (MPA_CRC_SIZE + 4)
>
>-static void siw_unmap_pages(struct page **pp, unsigned long
>kmap_mask)
>+static void siw_unmap_pages(void **addrs, int len)
> {
>- while (kmap_mask) {
>- if (kmap_mask & BIT(0))
>- kunmap(*pp);
>- pp++;
>- kmap_mask >>= 1;
>+ int i;
>+
>+ /*
>+ * Work backwards through the array to honor the kmap_local_page()
>+ * ordering requirements.
>+ */
>+ for (i = (len-1); i >= 0; i--) {
>+ if (addrs[i])
>+ kunmap_local(addrs[i]);
> }
> }
>
>@@ -427,13 +431,15 @@ static int siw_tx_hdt(struct siw_iwarp_tx
>*c_tx, struct socket *s)
> struct siw_sge *sge = &wqe->sqe.sge[c_tx->sge_idx];
> struct kvec iov[MAX_ARRAY];
> struct page *page_array[MAX_ARRAY];
>+ void *page_array_addr[MAX_ARRAY];
> struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_EOR };
>
> int seg = 0, do_crc = c_tx->do_crc, is_kva = 0, rv;
> unsigned int data_len = c_tx->bytes_unsent, hdr_len = 0, trl_len =
>0,
> sge_off = c_tx->sge_off, sge_idx = c_tx->sge_idx,
> pbl_idx = c_tx->pbl_idx;
>- unsigned long kmap_mask = 0L;
>+
>+ memset(page_array_addr, 0, sizeof(page_array_addr));
>
> if (c_tx->state == SIW_SEND_HDR) {
> if (c_tx->use_sendpage) {
>@@ -498,7 +504,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> p = siw_get_upage(mem->umem,
> sge->laddr + sge_off);
> if (unlikely(!p)) {
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
> wqe->processed -= c_tx->bytes_unsent;
> rv = -EFAULT;
> goto done_crc;
>@@ -506,11 +512,10 @@ static int siw_tx_hdt(struct siw_iwarp_tx
>*c_tx, struct socket *s)
> page_array[seg] = p;
>
> if (!c_tx->use_sendpage) {
>- iov[seg].iov_base = kmap(p) + fp_off;
>- iov[seg].iov_len = plen;
>+ page_array_addr[seg] = kmap_local_page(page_array[seg]);
>
>- /* Remember for later kunmap() */
>- kmap_mask |= BIT(seg);
>+ iov[seg].iov_base = page_array_addr[seg] + fp_off;
>+ iov[seg].iov_len = plen;
>
> if (do_crc)
> crypto_shash_update(
>@@ -518,7 +523,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> iov[seg].iov_base,
> plen);
> } else if (do_crc) {
>- kaddr = kmap_local_page(p);
>+ kaddr = kmap_local_page(page_array[seg]);
> crypto_shash_update(c_tx->mpa_crc_hd,
> kaddr + fp_off,
> plen);
>@@ -542,7 +547,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
>
> if (++seg > (int)MAX_ARRAY) {
> siw_dbg_qp(tx_qp(c_tx), "to many fragments\n");
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
> wqe->processed -= c_tx->bytes_unsent;
> rv = -EMSGSIZE;
> goto done_crc;
>@@ -593,7 +598,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> } else {
> rv = kernel_sendmsg(s, &msg, iov, seg + 1,
> hdr_len + data_len + trl_len);
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
> }
> if (rv < (int)hdr_len) {
> /* Not even complete hdr pushed or negative rv */
>--
>2.28.0.rc0.12.gb6a658bd00c9
>
>