Re: [PATCH 4/4] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

From: Ira Weiny
Date: Tue Jun 22 2021 - 16:39:59 EST


On Tue, Jun 22, 2021 at 04:42:49PM +0000, Bernard Metzler wrote:
> -----ira.weiny@xxxxxxxxx wrote: -----
>
> >To: "Jason Gunthorpe" <jgg@xxxxxxxx>
> >From: ira.weiny@xxxxxxxxx
> >Date: 06/22/2021 08:14AM
> >Cc: "Ira Weiny" <ira.weiny@xxxxxxxxx>, "Mike Marciniszyn"
> ><mike.marciniszyn@xxxxxxxxxxxxxxxxxxxx>, "Dennis Dalessandro"
> ><dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx>, "Doug Ledford"
> ><dledford@xxxxxxxxxx>, "Faisal Latif" <faisal.latif@xxxxxxxxx>,
> >"Shiraz Saleem" <shiraz.saleem@xxxxxxxxx>, "Bernard Metzler"
> ><bmt@xxxxxxxxxxxxxx>, "Kamal Heib" <kheib@xxxxxxxxxx>,
> >linux-rdma@xxxxxxxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx
> >Subject: [EXTERNAL] [PATCH 4/4] RDMA/siw: Convert siw_tx_hdt() to
> >kmap_local_page()
> >
> >From: Ira Weiny <ira.weiny@xxxxxxxxx>
> >
> >kmap() is being deprecated and will break uses of device dax after
> >PKS
> >protection is introduced.[1]
> >
> >The use of kmap() in siw_tx_hdt() is all thread local therefore
> >kmap_local_page() is a sufficient replacement and will work with
> >pgmap
> >protected pages when those are implemented.
> >
> >kmap_local_page() mappings are tracked in a stack and must be
> >unmapped
> >in the opposite order they were mapped in.
> >
> >siw_tx_hdt() tracks pages used in a page_array. It uses that array
> >to
> >unmap pages which were mapped on function exit. Not all entries in
> >the
> >array are mapped and this is tracked in kmap_mask.
> >
> >kunmap_local() takes a mapped address rather than a page. Declare a
> >mapped address array, page_array_addr, of the same size as the page
> >array to be used for unmapping.
> >
>
> Hi Ira, thanks for taking care of that!
>
> I think we can avoid introducing another 'page_array_addr[]' array
> here, which must be zeroed first and completely searched for
> valid mappings during unmap, and also further bloats the
> stack size of siw_tx_hdt(). I think we can go away with the
> already available iov[].iov_base addresses array, masking addresses
> with PAGE_MASK during unmapping to mask any first byte offset.
> All kmap_local_page() mapping end up at that list. For unmapping
> we can still rely on the kmap_mask bit field, which is more
> efficient to initialize and search for valid mappings. Ordering
> during unmapping can be guaranteed if we parse the bitmask
> in reverse order. Let me know if you prefer me to propose
> a change -- that siw_tx_hdt() thing became rather complex I
> have to admit!

Seems not too bad, V2 sent.

I was concerned with the additional stack size but only 28 pointers (If I did
my math right) did not seem too bad. It is redundant though so lets see if
I've gotten V2 right.

Thanks!
Ira

>
> Best,
> Bernard.
>
> >Use kmap_local_page() instead of kmap() to map pages in the
> >page_array.
> >
> >Because segments are mapped into the page array in increasing index
> >order, modify siw_unmap_pages() to unmap pages in decreasing order.
> >
> >The kmap_mask is no longer needed as the lack of an address in the
> >address array can indicate no unmap is required.
> >
> >[1]
> >INVALID URI REMOVED
> >lkml_20201009195033.3208459-2D59-2Dira.weiny-40intel.com_&d=DwIDAg&c=
> >jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8ZO1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&
> >m=wnRcc-qyXV_X7kyQfFYL6XPgmmakQxmo44BmjIon-w0&s=Y0aiKJ4EHZY8FJlI-uiPr
> >xcBE95kmgn3iEz3p8d5VF4&e=
> >
> >Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> >---
> > drivers/infiniband/sw/siw/siw_qp_tx.c | 35
> >+++++++++++++++------------
> > 1 file changed, 20 insertions(+), 15 deletions(-)
> >
> >diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c
> >b/drivers/infiniband/sw/siw/siw_qp_tx.c
> >index db68a10d12cd..e70aba23f6e7 100644
> >--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
> >+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
> >@@ -396,13 +396,17 @@ static int siw_0copy_tx(struct socket *s,
> >struct page **page,
> >
> > #define MAX_TRAILER (MPA_CRC_SIZE + 4)
> >
> >-static void siw_unmap_pages(struct page **pp, unsigned long
> >kmap_mask)
> >+static void siw_unmap_pages(void **addrs, int len)
> > {
> >- while (kmap_mask) {
> >- if (kmap_mask & BIT(0))
> >- kunmap(*pp);
> >- pp++;
> >- kmap_mask >>= 1;
> >+ int i;
> >+
> >+ /*
> >+ * Work backwards through the array to honor the kmap_local_page()
> >+ * ordering requirements.
> >+ */
> >+ for (i = (len-1); i >= 0; i--) {
> >+ if (addrs[i])
> >+ kunmap_local(addrs[i]);
> > }
> > }
> >
> >@@ -427,13 +431,15 @@ static int siw_tx_hdt(struct siw_iwarp_tx
> >*c_tx, struct socket *s)
> > struct siw_sge *sge = &wqe->sqe.sge[c_tx->sge_idx];
> > struct kvec iov[MAX_ARRAY];
> > struct page *page_array[MAX_ARRAY];
> >+ void *page_array_addr[MAX_ARRAY];
> > struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_EOR };
> >
> > int seg = 0, do_crc = c_tx->do_crc, is_kva = 0, rv;
> > unsigned int data_len = c_tx->bytes_unsent, hdr_len = 0, trl_len =
> >0,
> > sge_off = c_tx->sge_off, sge_idx = c_tx->sge_idx,
> > pbl_idx = c_tx->pbl_idx;
> >- unsigned long kmap_mask = 0L;
> >+
> >+ memset(page_array_addr, 0, sizeof(page_array_addr));
> >
> > if (c_tx->state == SIW_SEND_HDR) {
> > if (c_tx->use_sendpage) {
> >@@ -498,7 +504,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> >struct socket *s)
> > p = siw_get_upage(mem->umem,
> > sge->laddr + sge_off);
> > if (unlikely(!p)) {
> >- siw_unmap_pages(page_array, kmap_mask);
> >+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
> > wqe->processed -= c_tx->bytes_unsent;
> > rv = -EFAULT;
> > goto done_crc;
> >@@ -506,11 +512,10 @@ static int siw_tx_hdt(struct siw_iwarp_tx
> >*c_tx, struct socket *s)
> > page_array[seg] = p;
> >
> > if (!c_tx->use_sendpage) {
> >- iov[seg].iov_base = kmap(p) + fp_off;
> >- iov[seg].iov_len = plen;
> >+ page_array_addr[seg] = kmap_local_page(page_array[seg]);
> >
> >- /* Remember for later kunmap() */
> >- kmap_mask |= BIT(seg);
> >+ iov[seg].iov_base = page_array_addr[seg] + fp_off;
> >+ iov[seg].iov_len = plen;
> >
> > if (do_crc)
> > crypto_shash_update(
> >@@ -518,7 +523,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> >struct socket *s)
> > iov[seg].iov_base,
> > plen);
> > } else if (do_crc) {
> >- kaddr = kmap_local_page(p);
> >+ kaddr = kmap_local_page(page_array[seg]);
> > crypto_shash_update(c_tx->mpa_crc_hd,
> > kaddr + fp_off,
> > plen);
> >@@ -542,7 +547,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> >struct socket *s)
> >
> > if (++seg > (int)MAX_ARRAY) {
> > siw_dbg_qp(tx_qp(c_tx), "to many fragments\n");
> >- siw_unmap_pages(page_array, kmap_mask);
> >+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
> > wqe->processed -= c_tx->bytes_unsent;
> > rv = -EMSGSIZE;
> > goto done_crc;
> >@@ -593,7 +598,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> >struct socket *s)
> > } else {
> > rv = kernel_sendmsg(s, &msg, iov, seg + 1,
> > hdr_len + data_len + trl_len);
> >- siw_unmap_pages(page_array, kmap_mask);
> >+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
> > }
> > if (rv < (int)hdr_len) {
> > /* Not even complete hdr pushed or negative rv */
> >--
> >2.28.0.rc0.12.gb6a658bd00c9
> >
> >
>