Re: [PATCH] irqchip, gicv3-its, numa: Workaround for Cavium ThunderX erratum 23144
From: Marc Zyngier
Date: Mon Aug 24 2015 - 06:17:40 EST
Hi Robert,
Just came back from the Seattle madness, so picking up patches in
reverse order... ;-)
On 22/08/15 14:10, Robert Richter wrote:
> The patch below adds a workaround for gicv3 in a numa environment. It
> is on top of my recent gicv3 errata patch submission v4 and Ganapat's
> arm64 numa patches for devicetree v5.
>
> Please comment.
>
> Thanks,
>
> -Robert
>
>
>
> From c432820451a46b8d1e299b8bfbfcdcb3b75340e7 Mon Sep 17 00:00:00 2001
> From: Ganapatrao Kulkarni <gkulkarni@xxxxxxxxxxxxxxxxxx>
> Date: Wed, 19 Aug 2015 23:40:05 +0530
> Subject: [PATCH] irqchip, gicv3-its, numa: Workaround for Cavium ThunderX erratum
> 23144
>
> This implements a workaround for gicv3-its erratum 23144 applicable
> for Cavium's ThunderX multinode systems.
>
> The erratum fixes the hang of ITS SYNC command by avoiding inter node
> io and collections/cpu mapping. This fix is only applicable for
> Cavium's ThunderX dual-socket platforms.
Can you please elaborate on this? I can't see any reference to the SYNC
command there. Is that a case of an ITS not being able to route LPIs to
cores on another socket?
I really need more details to understand this patch (please use short
sentences, I'm still in a different time zone).
>
> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@xxxxxxxxxxxxxxxxxx>
> [ rric: Reworked errata code, added helper functions, updated commit
> message. ]
>
> Signed-off-by: Robert Richter <rrichter@xxxxxxxxxx>
> ---
> arch/arm64/Kconfig | 14 +++++++++++
> drivers/irqchip/irq-gic-common.c | 5 ++--
> drivers/irqchip/irq-gic-v3-its.c | 54 ++++++++++++++++++++++++++++++++++------
> 3 files changed, 64 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 3809187ed653..b92b7b70b29b 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -421,6 +421,20 @@ config ARM64_ERRATUM_845719
>
> If unsure, say Y.
>
> +config CAVIUM_ERRATUM_22375
> + bool "Cavium erratum 22375, 24313"
> + depends on NUMA
> + default y
> + help
> + Enable workaround for erratum 22375, 24313.
> +
> +config CAVIUM_ERRATUM_23144
> + bool "Cavium erratum 23144"
> + depends on NUMA
> + default y
> + help
> + Enable workaround for erratum 23144.
> +
> config CAVIUM_ERRATUM_23154
> bool "Cavium erratum 23154: Access to ICC_IAR1_EL1 is not sync'ed"
> depends on ARCH_THUNDER
> diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
> index ee789b07f2d1..1dfce64dbdac 100644
> --- a/drivers/irqchip/irq-gic-common.c
> +++ b/drivers/irqchip/irq-gic-common.c
> @@ -24,11 +24,12 @@
> void gic_check_capabilities(u32 iidr, const struct gic_capabilities *cap,
> void *data)
> {
> - for (; cap->desc; cap++) {
> + for (; cap->init; cap++) {
> if (cap->iidr != (cap->mask & iidr))
> continue;
> cap->init(data);
> - pr_info("%s\n", cap->desc);
> + if (cap->desc)
> + pr_info("%s\n", cap->desc);
No. I really want to see what errata are applied when I look at a kernel
log.
> }
> }
>
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index 4bb135721e72..666be39f13a9 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -43,7 +43,8 @@
> #include "irqchip.h"
>
> #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING (1ULL << 0)
> -#define ITS_FLAGS_CAVIUM_THUNDERX (1ULL << 1)
> +#define ITS_WORKAROUND_CAVIUM_22375 (1ULL << 1)
> +#define ITS_WORKAROUND_CAVIUM_23144 (1ULL << 2)
>
> #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0)
>
> @@ -76,6 +77,7 @@ struct its_node {
> struct list_head its_device_list;
> u64 flags;
> u32 ite_size;
> + int numa_node;
> };
>
> #define ITS_ITT_ALIGN SZ_256
> @@ -609,11 +611,18 @@ static void its_eoi_irq(struct irq_data *d)
> static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
> bool force)
> {
> - unsigned int cpu = cpumask_any_and(mask_val, cpu_online_mask);
> + unsigned int cpu;
> struct its_device *its_dev = irq_data_get_irq_chip_data(d);
> struct its_collection *target_col;
> u32 id = its_get_event_id(d);
>
> + if (its_dev->its->flags & ITS_WORKAROUND_CAVIUM_23144) {
> + cpu = cpumask_any_and(mask_val,
> + cpumask_of_node(its_dev->its->numa_node));
I suppose cpumask_of_node returns only the *online* cores of a given
node, right?
> + } else {
> + cpu = cpumask_any_and(mask_val, cpu_online_mask);
> + }
> +
> if (cpu >= nr_cpu_ids)
> return -EINVAL;
>
> @@ -845,7 +854,7 @@ static int its_alloc_tables(struct its_node *its)
> u64 typer;
> u32 ids;
>
> - if (its->flags & ITS_FLAGS_CAVIUM_THUNDERX) {
> + if (its->flags & ITS_WORKAROUND_CAVIUM_22375) {
> /*
> * erratum 22375: only alloc 8MB table size
> * erratum 24313: ignore memory access type
> @@ -1093,6 +1102,11 @@ static void its_cpu_init_lpis(void)
> dsb(sy);
> }
>
> +static inline int numa_node_id_early(void)
> +{
> + return MPIDR_AFFINITY_LEVEL(read_cpuid_mpidr(), 2);
> +}
> +
> static void its_cpu_init_collection(void)
> {
> struct its_node *its;
> @@ -1104,6 +1118,11 @@ static void its_cpu_init_collection(void)
> list_for_each_entry(its, &its_nodes, entry) {
> u64 target;
>
> + /* avoid cross node core and its mapping */
> + if ((its->flags & ITS_WORKAROUND_CAVIUM_23144) &&
> + its->numa_node != numa_node_id_early())
> + continue;
> +
Argh. This is horrible. You really need some topology bindings to
describe your system instead of hardcoding some random level of
affinity. The next time someone is going to come up with a similarly
broken system, they will have to reinvent that wheel.
> /*
> * We now have to bind each collection to its target
> * redistributor.
> @@ -1372,9 +1391,15 @@ static void its_irq_domain_activate(struct irq_domain *domain,
> {
> struct its_device *its_dev = irq_data_get_irq_chip_data(d);
> u32 event = its_get_event_id(d);
> + unsigned int cpu;
> +
> + if (its_dev->its->flags & ITS_WORKAROUND_CAVIUM_23144)
> + cpu = cpumask_first(cpumask_of_node(its_dev->its->numa_node));
> + else
> + cpu = cpumask_first(cpu_online_mask);
Looks like this can be factored with the code you've added in set_affinity.
>
> /* Bind the LPI to the first possible CPU */
> - its_dev->event_map.col_map[event] = cpumask_first(cpu_online_mask);
> + its_dev->event_map.col_map[event] = cpu;
>
> /* Map the GIC IRQ and event to the device */
> its_send_mapvi(its_dev, d->hwirq, event);
> @@ -1457,16 +1482,31 @@ static int its_force_quiescent(void __iomem *base)
> }
> }
>
> +static inline int its_get_node_thunderx(struct its_node *its)
> +{
> + return (its->phys_base >> 44) & 0x3;
Why 3? Is that because you have provision for 4 sockets or what?
> +}
> +
> static void its_enable_cavium_thunderx(void *data)
> {
> - struct its_node *its = data;
> + struct its_node __maybe_unused *its = data;
>
> - its->flags |= ITS_FLAGS_CAVIUM_THUNDERX;
> +#ifdef CONFIG_CAVIUM_ERRATUM_22375
> + its->flags |= ITS_WORKAROUND_CAVIUM_22375;
> + pr_info("ITS: Enabling workaround for 22375, 24313\n");
> +#endif
> +
> +#ifdef CONFIG_CAVIUM_ERRATUM_23144
> + if (num_possible_nodes() > 1) {
> + its->numa_node = its_get_node_thunderx(its);
I'd rather see numa_node being always initialized to something useful.
If you're adding numa support, why can't this be initialized via
standard topology bindings?
Also, you're mixing things coming from the address map and information
coming from the MPIDR. I must say this doesn't fill me with confidence.
> + its->flags |= ITS_WORKAROUND_CAVIUM_23144;
> + pr_info("ITS: Enabling workaround for 23144\n");
> + }
> +#endif
It would probably make sense to split these in two function, each with
its own erratum entry.
> }
>
> static const struct gic_capabilities its_errata[] = {
> {
> - .desc = "ITS: Cavium errata 22375, 24313",
> .iidr = 0xa100034c, /* ThunderX pass 1.x */
> .mask = 0xffff0fff,
> .init = its_enable_cavium_thunderx,
>
Thanks,
M.
--
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/