RE: [PATCH net-next] net: mana: Add support for variable page sizes of ARM64

From: Michael Kelley
Date: Tue Jun 11 2024 - 12:35:06 EST


From: LKML haiyangz <lkmlhyz@xxxxxxxxxxxxx> On Behalf Of Haiyang Zhang Sent: Monday, June 10, 2024 2:23 PM
>
> As defined by the MANA Hardware spec, the queue size for DMA is 4KB
> minimal, and power of 2.

You say the hardware requires 4K "minimal". But the definitions in this
patch hardcode to 4K, as if that's the only choice. Is the hardcoding to
4K a design decision made to simplify the MANA driver?

> To support variable page sizes (4KB, 16KB, 64KB) of ARM64, define

A minor nit, but "variable" page size doesn't seem like quite the right
description -- both here and in the Subject line. On ARM64, the page size
is a choice among a few fixed options. Perhaps call it support for "page sizes
other than 4K"?

> the minimal queue size as a macro separate from the PAGE_SIZE, which
> we always assumed it to be 4KB before supporting ARM64.
> Also, update the relevant code related to size alignment, DMA region
> calculations, etc.
>
> Signed-off-by: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> ---
> drivers/net/ethernet/microsoft/Kconfig | 2 +-
> .../net/ethernet/microsoft/mana/gdma_main.c | 8 +++----
> .../net/ethernet/microsoft/mana/hw_channel.c | 22 +++++++++----------
> drivers/net/ethernet/microsoft/mana/mana_en.c | 8 +++----
> .../net/ethernet/microsoft/mana/shm_channel.c | 9 ++++----
> include/net/mana/gdma.h | 7 +++++-
> include/net/mana/mana.h | 3 ++-
> 7 files changed, 33 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/net/ethernet/microsoft/Kconfig
> b/drivers/net/ethernet/microsoft/Kconfig
> index 286f0d5697a1..901fbffbf718 100644
> --- a/drivers/net/ethernet/microsoft/Kconfig
> +++ b/drivers/net/ethernet/microsoft/Kconfig
> @@ -18,7 +18,7 @@ if NET_VENDOR_MICROSOFT
> config MICROSOFT_MANA
> tristate "Microsoft Azure Network Adapter (MANA) support"
> depends on PCI_MSI
> - depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN && ARM64_4K_PAGES)
> + depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN)
> depends on PCI_HYPERV
> select AUXILIARY_BUS
> select PAGE_POOL
> diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> index 1332db9a08eb..c9df942d0d02 100644
> --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> @@ -182,7 +182,7 @@ int mana_gd_alloc_memory(struct gdma_context *gc,
> unsigned int length,
> dma_addr_t dma_handle;
> void *buf;
>
> - if (length < PAGE_SIZE || !is_power_of_2(length))
> + if (length < MANA_MIN_QSIZE || !is_power_of_2(length))
> return -EINVAL;
>
> gmi->dev = gc->dev;
> @@ -717,7 +717,7 @@ EXPORT_SYMBOL_NS(mana_gd_destroy_dma_region,
> NET_MANA);
> static int mana_gd_create_dma_region(struct gdma_dev *gd,
> struct gdma_mem_info *gmi)
> {
> - unsigned int num_page = gmi->length / PAGE_SIZE;
> + unsigned int num_page = gmi->length / MANA_MIN_QSIZE;

This calculation seems a bit weird when using MANA_MIN_QSIZE. The
number of pages, and the construction of the page_addr_list array
a few lines later, seem unrelated to the concept of a minimum queue
size. Is the right concept really a "mapping chunk", and num_page
would conceptually be "num_chunks", or something like that? Then
a queue must be at least one chunk in size, but that's derived from the
chunk size, and is not the core concept.

Another approach might be to just call it "MANA_PAGE_SIZE", like
has been done with HV_HYP_PAGE_SIZE. HV_HYP_PAGE_SIZE exists to
handle exactly the same issue of the guest PAGE_SIZE potentially
being different from the fixed 4K size that must be used in host-guest
communication on Hyper-V. Same thing here with MANA.

> struct gdma_create_dma_region_req *req = NULL;
> struct gdma_create_dma_region_resp resp = {};
> struct gdma_context *gc = gd->gdma_context;
> @@ -727,7 +727,7 @@ static int mana_gd_create_dma_region(struct gdma_dev *gd,
> int err;
> int i;
>
> - if (length < PAGE_SIZE || !is_power_of_2(length))
> + if (length < MANA_MIN_QSIZE || !is_power_of_2(length))
> return -EINVAL;
>
> if (offset_in_page(gmi->virt_addr) != 0)
> @@ -751,7 +751,7 @@ static int mana_gd_create_dma_region(struct gdma_dev *gd,
> req->page_addr_list_len = num_page;
>
> for (i = 0; i < num_page; i++)
> - req->page_addr_list[i] = gmi->dma_handle + i * PAGE_SIZE;
> + req->page_addr_list[i] = gmi->dma_handle + i * MANA_MIN_QSIZE;
>
> err = mana_gd_send_request(gc, req_msg_size, req, sizeof(resp), &resp);
> if (err)
> diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> index bbc4f9e16c98..038dc31e09cd 100644
> --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c
> +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c
> @@ -362,12 +362,12 @@ static int mana_hwc_create_cq(struct hw_channel_context
> *hwc, u16 q_depth,
> int err;
>
> eq_size = roundup_pow_of_two(GDMA_EQE_SIZE * q_depth);
> - if (eq_size < MINIMUM_SUPPORTED_PAGE_SIZE)
> - eq_size = MINIMUM_SUPPORTED_PAGE_SIZE;
> + if (eq_size < MANA_MIN_QSIZE)
> + eq_size = MANA_MIN_QSIZE;
>
> cq_size = roundup_pow_of_two(GDMA_CQE_SIZE * q_depth);
> - if (cq_size < MINIMUM_SUPPORTED_PAGE_SIZE)
> - cq_size = MINIMUM_SUPPORTED_PAGE_SIZE;
> + if (cq_size < MANA_MIN_QSIZE)
> + cq_size = MANA_MIN_QSIZE;
>
> hwc_cq = kzalloc(sizeof(*hwc_cq), GFP_KERNEL);
> if (!hwc_cq)
> @@ -429,7 +429,7 @@ static int mana_hwc_alloc_dma_buf(struct
> hw_channel_context *hwc, u16 q_depth,
>
> dma_buf->num_reqs = q_depth;
>
> - buf_size = PAGE_ALIGN(q_depth * max_msg_size);
> + buf_size = MANA_MIN_QALIGN(q_depth * max_msg_size);
>
> gmi = &dma_buf->mem_info;
> err = mana_gd_alloc_memory(gc, buf_size, gmi);
> @@ -497,8 +497,8 @@ static int mana_hwc_create_wq(struct hw_channel_context
> *hwc,
> else
> queue_size = roundup_pow_of_two(GDMA_MAX_SQE_SIZE *
> q_depth);
>
> - if (queue_size < MINIMUM_SUPPORTED_PAGE_SIZE)
> - queue_size = MINIMUM_SUPPORTED_PAGE_SIZE;
> + if (queue_size < MANA_MIN_QSIZE)
> + queue_size = MANA_MIN_QSIZE;
>
> hwc_wq = kzalloc(sizeof(*hwc_wq), GFP_KERNEL);
> if (!hwc_wq)
> @@ -628,10 +628,10 @@ static int mana_hwc_establish_channel(struct
> gdma_context *gc, u16 *q_depth,
> init_completion(&hwc->hwc_init_eqe_comp);
>
> err = mana_smc_setup_hwc(&gc->shm_channel, false,
> - eq->mem_info.dma_handle,
> - cq->mem_info.dma_handle,
> - rq->mem_info.dma_handle,
> - sq->mem_info.dma_handle,
> + virt_to_phys(eq->mem_info.virt_addr),
> + virt_to_phys(cq->mem_info.virt_addr),
> + virt_to_phys(rq->mem_info.virt_addr),
> + virt_to_phys(sq->mem_info.virt_addr),

This change seems unrelated to handling guest PAGE_SIZE values
other than 4K. Does it belong in a separate patch? Or maybe it just
needs an explanation in the commit message of this patch?

> eq->eq.msix_index);
> if (err)
> return err;
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> index d087cf954f75..6a891dbce686 100644
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> @@ -1889,10 +1889,10 @@ static int mana_create_txq(struct mana_port_context
> *apc,
> * to prevent overflow.
> */
> txq_size = MAX_SEND_BUFFERS_PER_QUEUE * 32;
> - BUILD_BUG_ON(!PAGE_ALIGNED(txq_size));
> + BUILD_BUG_ON(!MANA_MIN_QALIGNED(txq_size));
>
> cq_size = MAX_SEND_BUFFERS_PER_QUEUE * COMP_ENTRY_SIZE;
> - cq_size = PAGE_ALIGN(cq_size);
> + cq_size = MANA_MIN_QALIGN(cq_size);
>
> gc = gd->gdma_context;
>
> @@ -2189,8 +2189,8 @@ static struct mana_rxq *mana_create_rxq(struct
> mana_port_context *apc,
> if (err)
> goto out;
>
> - rq_size = PAGE_ALIGN(rq_size);
> - cq_size = PAGE_ALIGN(cq_size);
> + rq_size = MANA_MIN_QALIGN(rq_size);
> + cq_size = MANA_MIN_QALIGN(cq_size);
>
> /* Create RQ */
> memset(&spec, 0, sizeof(spec));
> diff --git a/drivers/net/ethernet/microsoft/mana/shm_channel.c
> b/drivers/net/ethernet/microsoft/mana/shm_channel.c
> index 5553af9c8085..9a54a163d8d1 100644
> --- a/drivers/net/ethernet/microsoft/mana/shm_channel.c
> +++ b/drivers/net/ethernet/microsoft/mana/shm_channel.c
> @@ -6,6 +6,7 @@
> #include <linux/io.h>
> #include <linux/mm.h>
>
> +#include <net/mana/gdma.h>
> #include <net/mana/shm_channel.h>
>
> #define PAGE_FRAME_L48_WIDTH_BYTES 6
> @@ -183,7 +184,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> reset_vf, u64 eq_addr,
>
> /* EQ addr: low 48 bits of frame address */
> shmem = (u64 *)ptr;
> - frame_addr = PHYS_PFN(eq_addr);
> + frame_addr = MANA_PFN(eq_addr);
> *shmem = frame_addr & PAGE_FRAME_L48_MASK;
> all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
> (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);

In mana_smc_setup_hwc() a few lines above this change, code using
PAGE_ALIGNED() is unchanged. Is it correct that the eq/cq/rq/sq addresses
must be aligned to 64K if PAGE_SIZE is 64K?

Related, I wonder about how MANA_PFN() is defined. If PAGE_SIZE is 64K,
MANA_PFN() will first right-shift 16, then left shift 4. The net is right-shift 12,
corresponding to the 4K chunks that MANA expects. But that approach guarantees
that the rightmost 4 bits of the MANA PFN will always be zero. That's consistent
with requiring the addresses to be PAGE_ALIGNED() to 64K, but I'm unclear whether
that is really the requirement. You might compare with the definition of
HVPFN_DOWN(), which has a similar goal for Linux guests communicating with
Hyper-V.

> @@ -191,7 +192,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> reset_vf, u64 eq_addr,
>
> /* CQ addr: low 48 bits of frame address */
> shmem = (u64 *)ptr;
> - frame_addr = PHYS_PFN(cq_addr);
> + frame_addr = MANA_PFN(cq_addr);
> *shmem = frame_addr & PAGE_FRAME_L48_MASK;
> all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
> (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> @@ -199,7 +200,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> reset_vf, u64 eq_addr,
>
> /* RQ addr: low 48 bits of frame address */
> shmem = (u64 *)ptr;
> - frame_addr = PHYS_PFN(rq_addr);
> + frame_addr = MANA_PFN(rq_addr);
> *shmem = frame_addr & PAGE_FRAME_L48_MASK;
> all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
> (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> @@ -207,7 +208,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool
> reset_vf, u64 eq_addr,
>
> /* SQ addr: low 48 bits of frame address */
> shmem = (u64 *)ptr;
> - frame_addr = PHYS_PFN(sq_addr);
> + frame_addr = MANA_PFN(sq_addr);
> *shmem = frame_addr & PAGE_FRAME_L48_MASK;
> all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) <<
> (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS);
> diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
> index 27684135bb4d..b392559c33e9 100644
> --- a/include/net/mana/gdma.h
> +++ b/include/net/mana/gdma.h
> @@ -224,7 +224,12 @@ struct gdma_dev {
> struct auxiliary_device *adev;
> };
>
> -#define MINIMUM_SUPPORTED_PAGE_SIZE PAGE_SIZE
> +/* These are defined by HW */
> +#define MANA_MIN_QSHIFT 12
> +#define MANA_MIN_QSIZE (1 << MANA_MIN_QSHIFT)
> +#define MANA_MIN_QALIGN(x) ALIGN((x), MANA_MIN_QSIZE)
> +#define MANA_MIN_QALIGNED(addr) IS_ALIGNED((unsigned long)(addr), MANA_MIN_QSIZE)
> +#define MANA_PFN(a) (PHYS_PFN(a) << (PAGE_SHIFT - MANA_MIN_QSHIFT))

See comments above about how this is defined.

Michael

>
> #define GDMA_CQE_SIZE 64
> #define GDMA_EQE_SIZE 16
> diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> index 561f6719fb4e..43e8fc574354 100644
> --- a/include/net/mana/mana.h
> +++ b/include/net/mana/mana.h
> @@ -42,7 +42,8 @@ enum TRI_STATE {
>
> #define MAX_SEND_BUFFERS_PER_QUEUE 256
>
> -#define EQ_SIZE (8 * PAGE_SIZE)
> +#define EQ_SIZE (8 * MANA_MIN_QSIZE)
> +
> #define LOG2_EQ_THROTTLE 3
>
> #define MAX_PORTS_IN_MANA_DEV 256
> --
> 2.34.1
>