Re: [PATCH] irqchip/gicv3-its: Implement two-level(indirect) device table support

From: Marc Zyngier
Date: Fri May 06 2016 - 10:23:12 EST


Hi Shanker,

Thanks for putting this together. Comments below:

On Fri, 6 May 2016 08:43:36 -0500
Shanker Donthineni <shankerd@xxxxxxxxxxxxxx> wrote:

> Since device IDs are extremely sparse, the single, a.k.a flat table is
> not sufficient for the following two reasons.
>
> 1) According to ARM-GIC spec, ITS hw can access maximum of 256(pages)*
> 64K(pageszie) bytes. In the best case, it supports upto DEVid=21
> sparse with minimum device table entry size 8bytes.
>
> 2) The maximum memory size that is possible without memblock depends on
> MAX_ORDER. 4MB on 4K page size kernel with default MAX_ORDER, so it
> supports DEVid range 19bits.
>
> The two-level device table feature brings us two advantages, the first
> is a very high possibility of supporting upto 32bit sparse, and the
> second one is the best utilization of memory allocation.
>
> The feature is enabled automatically during driver probe if the flat
> table is not adequate and the hardware is capable of two-level table
> walk.
>
> Signed-off-by: Shanker Donthineni <shankerd@xxxxxxxxxxxxxx>
> ---
>
> This patch is based on Marc Zyngier's branch https://git.kernel.org/cgit/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/irqchip-4.7
>
> drivers/irqchip/irq-gic-v3-its.c | 109 ++++++++++++++++++++++++++++++++-----
> include/linux/irqchip/arm-gic-v3.h | 2 +
> 2 files changed, 97 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 6bd881b..caeb105 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -55,13 +55,14 @@ struct its_collection {
> };
>
> /*
> - * The ITS_BASER structure - contains memory information and cached
> - * value of BASER register configuration.
> + * The ITS_BASER structure - contains memory information, cached value
> + * of BASER register configuration and ITS page size in bytes.
> */
> struct its_baser {
> void *base;
> u64 val;
> u32 order;
> + u32 psz;
> };
>
> /*
> @@ -853,6 +854,8 @@ static int its_alloc_tables(const char *node_name, struct its_node *its)
> u64 type = GITS_BASER_TYPE(val);
> u64 entry_size = GITS_BASER_ENTRY_SIZE(val);
> int order = get_order(psz);
> + bool tried_indirect = false;
> + u64 indirect = 0;
> int alloc_pages;
> u64 tmp;
> void *base;
> @@ -877,11 +880,8 @@ static int its_alloc_tables(const char *node_name, struct its_node *its)
> */
> order = max(get_order((1UL << ids) * entry_size),
> order);
> - if (order >= MAX_ORDER) {
> + if (order >= MAX_ORDER)
> order = MAX_ORDER - 1;
> - pr_warn("%s: Device Table too large, reduce its page order to %u\n",
> - node_name, order);
> - }
> }
>
> retry_alloc_baser:
> @@ -889,8 +889,6 @@ retry_alloc_baser:
> if (alloc_pages > GITS_BASER_PAGES_MAX) {
> alloc_pages = GITS_BASER_PAGES_MAX;
> order = get_order(GITS_BASER_PAGES_MAX * psz);
> - pr_warn("%s: Device Table too large, reduce its page order to %u (%u pages)\n",
> - node_name, order, alloc_pages);
> }
>
> base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> @@ -924,6 +922,7 @@ retry_baser:
>
> val |= alloc_pages - 1;
> its->tables[i].val = val;
> + its->tables[i].psz = psz;
>
> writeq_relaxed(val, its->base + GITS_BASER + i * 8);
> tmp = readq_relaxed(its->base + GITS_BASER + i * 8);
> @@ -963,6 +962,49 @@ retry_baser:
> }
> }
>
> + /*
> + * Attempt enabling Indirection table walk since we were unable
> + * to allocate enough memory for flat device table to cover
> + * whole DevID sparse that hw reporting, and ITS hardware has
> + * capablity of supporting two-level table.
> + */
> + if ((type == GITS_BASER_TYPE_DEVICE) &&
> + (!tried_indirect) &&
> + (PAGE_ORDER_TO_SIZE(order) < (entry_size << ids))) {

Why not try to use indirect tables if they are available, irrespective
of the size of the required allocation? From a memory usage point of
view, it is likely to always be a gain, unless your id space is so
small that it fits in a single page. And with PCIe, you know for sure
that the ID space is extremely sparse, hence indirect tables are an
immediate win.

Do you see any value in using the indirect table as a last resort?

> +
> + val |= GITS_BASER_INDIRECT;
> + writeq_relaxed(val, its->base + GITS_BASER + i * 8);
> + tmp = readq_relaxed(its->base + GITS_BASER + i * 8);
> + tried_indirect = true;
> +
> + /* Does ITS HW supports Indirect table walk? */
> + if (!(tmp & GITS_BASER_INDIRECT)) {
> + pr_warn("%s: Device Table too large, reduce order to %u\n",
> + node_name, order);
> + goto retry_baser;
> + }
> +
> + /*
> + * According to GIC spec, the size of the Level2 table
> + * is equal to ITS page size which is 'psz'. For
> + * computing Level1 table1 size, subtract DevID bits
> + * that covers Level2 table from 'ids' which is reported
> + * by ITS hardware times Level1 table entry size 8bytes.
> + *
> + * L2-DevIDs = ilog2(psz / entry_size)
> + * L1-DevIDs = ids (reported by HW) - L2-DevIDs
> + * Table1 size = L1-DevIDs * 8Bytes
> + */
> + ids -= ilog2(psz / entry_size);
> + order = get_order(GITS_LEVEL1_ENTRY_SIZE << ids);
> +
> + /* Limit Level1 table size to MAX_ORDER-1 */
> + order = min(order, MAX_ORDER - 1);
> + indirect = GITS_BASER_INDIRECT;
> + its->tables[type].base = NULL;
> + goto retry_alloc_baser;
> + }
> +

Please take this opportunity to split this function into something
manageable. It is really getting out of control. Something to directly
deal with the device table seems in order.

> if (val != tmp) {
> pr_err("ITS: %s: GITS_BASER%d doesn't stick: %lx %lx\n",
> node_name, i,
> @@ -971,10 +1013,12 @@ retry_baser:
> goto out_free;
> }
>
> - pr_info("ITS: allocated %d %s @%lx (psz %dK, shr %d)\n",
> - (int)(PAGE_ORDER_TO_SIZE(order) / entry_size),
> + pr_info("ITS: allocated %d %s @%lx (%s, psz %dK, shr %d)\n",
> + (int)(PAGE_ORDER_TO_SIZE(order) /
> + (indirect ? GITS_LEVEL1_ENTRY_SIZE : entry_size)),
> its_base_type_string[type],
> (unsigned long)virt_to_phys(base),
> + indirect ? "indirect" : "flat",
> psz / SZ_1K, (int)shr >> GITS_BASER_SHAREABILITY_SHIFT);
> }
>
> @@ -1149,6 +1193,46 @@ static struct its_device *its_find_device(struct its_node *its, u32 dev_id)
> return its_dev;
> }
>
> +static bool its_alloc_device_table(struct its_baser *baser, u32 dev_id)
> +{
> + u32 entry_size = GITS_BASER_ENTRY_SIZE(baser->val);
> + u64 *table = baser->base;
> + struct page *page;
> + u32 psz = SZ_64K;
> + u32 idx;
> +
> + /* Don't allow 'dev_id' that exceeds single, flat table limit */
> + if (!(baser->val & GITS_BASER_INDIRECT)) {
> + if (dev_id >= (PAGE_ORDER_TO_SIZE(baser->order) / entry_size))
> + return false;
> + return true;
> + }
> +
> + /* Compute Level1 table index & check if that exceeds table range */
> + idx = dev_id >> ilog2(baser->psz / entry_size);
> + if (idx >= (PAGE_ORDER_TO_SIZE(baser->order) / GITS_LEVEL1_ENTRY_SIZE))
> + return false;
> +
> + /* Allocate memory for Level2 table */
> + if (!table[idx]) {
> + page = alloc_pages_exact(psz, GFP_KERNEL | __GFP_ZERO);
> + if (!page)
> + return false;
> +
> + table[idx] = page_to_phys(page) | GITS_BASER_VALID;

Be careful, this must be written in LE format. If you're running a BE
kernel, you're in for a nasty surprise...

> +
> + /* Flush memory to PoC if hardware dosn't support coherency */
> + if (!(baser->val & GITS_BASER_SHAREABILITY_MASK)) {
> + __flush_dcache_area(table + idx, GITS_LEVEL1_ENTRY_SIZE);
> + __flush_dcache_area(page_address(page), psz);
> + }
> + /* Ensure updated table contents are visible to ITS hardware */
> + dsb(sy);
> + }
> +
> + return true;
> +}
> +
> static struct its_baser *its_get_baser(struct its_node *its, u32 type)
> {
> int i;
> @@ -1176,11 +1260,8 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
> int sz;
>
> baser = its_get_baser(its, GITS_BASER_TYPE_DEVICE);
> -
> - /* Don't allow 'dev_id' that exceeds single, flat table limit */
> if (baser) {
> - if (dev_id >= (PAGE_ORDER_TO_SIZE(baser->order) /
> - GITS_BASER_ENTRY_SIZE(baser->val)))
> + if (!its_alloc_device_table(baser, dev_id))
> return NULL;
> } else if (ilog2(dev_id) >= its->device_ids)
> return NULL;

Please fold these checks into its_alloc_device_table so that it takes
an its_node as a first parameter instead of an its_baser. That will make
the code a bit more readable.

> diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h
> index 9e6fdd3..df404ea 100644
> --- a/include/linux/irqchip/arm-gic-v3.h
> +++ b/include/linux/irqchip/arm-gic-v3.h
> @@ -187,6 +187,7 @@
> #define GITS_TYPER_PTA (1UL << 19)
>
> #define GITS_CBASER_VALID (1UL << 63)
> +#define GITS_BASER_INDIRECT (1UL << 62)
> #define GITS_CBASER_nCnB (0UL << 59)
> #define GITS_CBASER_nC (1UL << 59)
> #define GITS_CBASER_RaWt (2UL << 59)
> @@ -238,6 +239,7 @@
> #define GITS_BASER_TYPE_RESERVED6 6
> #define GITS_BASER_TYPE_RESERVED7 7
>
> +#define GITS_LEVEL1_ENTRY_SIZE (8UL)
> /*
> * ITS commands
> */


Thanks,

M.
--
Jazz is not dead. It just smells funny.