Re: [PATCH] irqchip, gicv3-its, numa: Workaround for Cavium ThunderX erratum 23144

From: Ganapatrao Kulkarni
Date: Mon Aug 24 2015 - 08:30:41 EST


Hi Marc,

thanks for the review comments.

On Mon, Aug 24, 2015 at 3:47 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
> Hi Robert,
>
> Just came back from the Seattle madness, so picking up patches in
> reverse order... ;-)
>
> On 22/08/15 14:10, Robert Richter wrote:
>> The patch below adds a workaround for gicv3 in a numa environment. It
>> is on top of my recent gicv3 errata patch submission v4 and Ganapat's
>> arm64 numa patches for devicetree v5.
>>
>> Please comment.
>>
>> Thanks,
>>
>> -Robert
>>
>>
>>
>> From c432820451a46b8d1e299b8bfbfcdcb3b75340e7 Mon Sep 17 00:00:00 2001
>> From: Ganapatrao Kulkarni <gkulkarni@xxxxxxxxxxxxxxxxxx>
>> Date: Wed, 19 Aug 2015 23:40:05 +0530
>> Subject: [PATCH] irqchip, gicv3-its, numa: Workaround for Cavium ThunderX erratum
>> 23144
>>
>> This implements a workaround for gicv3-its erratum 23144 applicable
>> for Cavium's ThunderX multinode systems.
>>
>> The erratum fixes the hang of ITS SYNC command by avoiding inter node
>> io and collections/cpu mapping. This fix is only applicable for
>> Cavium's ThunderX dual-socket platforms.
>
> Can you please elaborate on this? I can't see any reference to the SYNC
> command there. Is that a case of an ITS not being able to route LPIs to
> cores on another socket?
we were seeing mapc command failing when we were mapping its of node0
with collections of node1(vice-versa).
we found sync was timing out, which is issued post mapc(also for mapvi
and movi).
Yes this errata limits the routing of inter-node LPIs.

>
> I really need more details to understand this patch (please use short
> sentences, I'm still in a different time zone).
>
>>
>> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@xxxxxxxxxxxxxxxxxx>
>> [ rric: Reworked errata code, added helper functions, updated commit
>> message. ]
>>
>> Signed-off-by: Robert Richter <rrichter@xxxxxxxxxx>
>> ---
>> arch/arm64/Kconfig | 14 +++++++++++
>> drivers/irqchip/irq-gic-common.c | 5 ++--
>> drivers/irqchip/irq-gic-v3-its.c | 54 ++++++++++++++++++++++++++++++++++------
>> 3 files changed, 64 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 3809187ed653..b92b7b70b29b 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -421,6 +421,20 @@ config ARM64_ERRATUM_845719
>>
>> If unsure, say Y.
>>
>> +config CAVIUM_ERRATUM_22375
>> + bool "Cavium erratum 22375, 24313"
>> + depends on NUMA
>> + default y
>> + help
>> + Enable workaround for erratum 22375, 24313.
>> +
>> +config CAVIUM_ERRATUM_23144
>> + bool "Cavium erratum 23144"
>> + depends on NUMA
>> + default y
>> + help
>> + Enable workaround for erratum 23144.
>> +
>> config CAVIUM_ERRATUM_23154
>> bool "Cavium erratum 23154: Access to ICC_IAR1_EL1 is not sync'ed"
>> depends on ARCH_THUNDER
>> diff --git a/drivers/irqchip/irq-gic-common.c b/drivers/irqchip/irq-gic-common.c
>> index ee789b07f2d1..1dfce64dbdac 100644
>> --- a/drivers/irqchip/irq-gic-common.c
>> +++ b/drivers/irqchip/irq-gic-common.c
>> @@ -24,11 +24,12 @@
>> void gic_check_capabilities(u32 iidr, const struct gic_capabilities *cap,
>> void *data)
>> {
>> - for (; cap->desc; cap++) {
>> + for (; cap->init; cap++) {
>> if (cap->iidr != (cap->mask & iidr))
>> continue;
>> cap->init(data);
>> - pr_info("%s\n", cap->desc);
>> + if (cap->desc)
>> + pr_info("%s\n", cap->desc);
>
> No. I really want to see what errata are applied when I look at a kernel
> log.
sorry, did not understood your comment, it is still printed using cap->desc.
>
>> }
>> }
>>
>> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
>> index 4bb135721e72..666be39f13a9 100644
>> --- a/drivers/irqchip/irq-gic-v3-its.c
>> +++ b/drivers/irqchip/irq-gic-v3-its.c
>> @@ -43,7 +43,8 @@
>> #include "irqchip.h"
>>
>> #define ITS_FLAGS_CMDQ_NEEDS_FLUSHING (1ULL << 0)
>> -#define ITS_FLAGS_CAVIUM_THUNDERX (1ULL << 1)
>> +#define ITS_WORKAROUND_CAVIUM_22375 (1ULL << 1)
>> +#define ITS_WORKAROUND_CAVIUM_23144 (1ULL << 2)
>>
>> #define RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING (1 << 0)
>>
>> @@ -76,6 +77,7 @@ struct its_node {
>> struct list_head its_device_list;
>> u64 flags;
>> u32 ite_size;
>> + int numa_node;
>> };
>>
>> #define ITS_ITT_ALIGN SZ_256
>> @@ -609,11 +611,18 @@ static void its_eoi_irq(struct irq_data *d)
>> static int its_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
>> bool force)
>> {
>> - unsigned int cpu = cpumask_any_and(mask_val, cpu_online_mask);
>> + unsigned int cpu;
>> struct its_device *its_dev = irq_data_get_irq_chip_data(d);
>> struct its_collection *target_col;
>> u32 id = its_get_event_id(d);
>>
>> + if (its_dev->its->flags & ITS_WORKAROUND_CAVIUM_23144) {
>> + cpu = cpumask_any_and(mask_val,
>> + cpumask_of_node(its_dev->its->numa_node));
>
> I suppose cpumask_of_node returns only the *online* cores of a given
> node, right?
yes.
>
>> + } else {
>> + cpu = cpumask_any_and(mask_val, cpu_online_mask);
>> + }
>> +
>> if (cpu >= nr_cpu_ids)
>> return -EINVAL;
>>
>> @@ -845,7 +854,7 @@ static int its_alloc_tables(struct its_node *its)
>> u64 typer;
>> u32 ids;
>>
>> - if (its->flags & ITS_FLAGS_CAVIUM_THUNDERX) {
>> + if (its->flags & ITS_WORKAROUND_CAVIUM_22375) {
>> /*
>> * erratum 22375: only alloc 8MB table size
>> * erratum 24313: ignore memory access type
>> @@ -1093,6 +1102,11 @@ static void its_cpu_init_lpis(void)
>> dsb(sy);
>> }
>>
>> +static inline int numa_node_id_early(void)
>> +{
>> + return MPIDR_AFFINITY_LEVEL(read_cpuid_mpidr(), 2);
>> +}
>> +
>> static void its_cpu_init_collection(void)
>> {
>> struct its_node *its;
>> @@ -1104,6 +1118,11 @@ static void its_cpu_init_collection(void)
>> list_for_each_entry(its, &its_nodes, entry) {
>> u64 target;
>>
>> + /* avoid cross node core and its mapping */
>> + if ((its->flags & ITS_WORKAROUND_CAVIUM_23144) &&
>> + its->numa_node != numa_node_id_early())
>> + continue;
>> +
>
> Argh. This is horrible. You really need some topology bindings to
> describe your system instead of hardcoding some random level of
> affinity. The next time someone is going to come up with a similarly
> broken system, they will have to reinvent that wheel.
thanks for the suggestion, we will use cpu_topology[cpuid].cluster_id
instead of function numa_node_id_early()
>
>> /*
>> * We now have to bind each collection to its target
>> * redistributor.
>> @@ -1372,9 +1391,15 @@ static void its_irq_domain_activate(struct irq_domain *domain,
>> {
>> struct its_device *its_dev = irq_data_get_irq_chip_data(d);
>> u32 event = its_get_event_id(d);
>> + unsigned int cpu;
>> +
>> + if (its_dev->its->flags & ITS_WORKAROUND_CAVIUM_23144)
>> + cpu = cpumask_first(cpumask_of_node(its_dev->its->numa_node));
>> + else
>> + cpu = cpumask_first(cpu_online_mask);
>
> Looks like this can be factored with the code you've added in set_affinity.
we cant issue mapvi with cross-node mapping.
>
>>
>> /* Bind the LPI to the first possible CPU */
>> - its_dev->event_map.col_map[event] = cpumask_first(cpu_online_mask);
>> + its_dev->event_map.col_map[event] = cpu;
>>
>> /* Map the GIC IRQ and event to the device */
>> its_send_mapvi(its_dev, d->hwirq, event);
>> @@ -1457,16 +1482,31 @@ static int its_force_quiescent(void __iomem *base)
>> }
>> }
>>
>> +static inline int its_get_node_thunderx(struct its_node *its)
>> +{
>> + return (its->phys_base >> 44) & 0x3;
>
> Why 3? Is that because you have provision for 4 sockets or what?
yes.
>
>> +}
>> +
>> static void its_enable_cavium_thunderx(void *data)
>> {
>> - struct its_node *its = data;
>> + struct its_node __maybe_unused *its = data;
>>
>> - its->flags |= ITS_FLAGS_CAVIUM_THUNDERX;
>> +#ifdef CONFIG_CAVIUM_ERRATUM_22375
>> + its->flags |= ITS_WORKAROUND_CAVIUM_22375;
>> + pr_info("ITS: Enabling workaround for 22375, 24313\n");
>> +#endif
>> +
>> +#ifdef CONFIG_CAVIUM_ERRATUM_23144
>> + if (num_possible_nodes() > 1) {
>> + its->numa_node = its_get_node_thunderx(its);
>
> I'd rather see numa_node being always initialized to something useful.
> If you're adding numa support, why can't this be initialized via
> standard topology bindings?
IIUC, topology defines only cpu topology.
>
> Also, you're mixing things coming from the address map and information
> coming from the MPIDR. I must say this doesn't fill me with confidence.
>
>> + its->flags |= ITS_WORKAROUND_CAVIUM_23144;
>> + pr_info("ITS: Enabling workaround for 23144\n");
>> + }
>> +#endif
>
> It would probably make sense to split these in two function, each with
> its own erratum entry.
sure will do.
>
>> }
>>
>> static const struct gic_capabilities its_errata[] = {
>> {
>> - .desc = "ITS: Cavium errata 22375, 24313",
>> .iidr = 0xa100034c, /* ThunderX pass 1.x */
>> .mask = 0xffff0fff,
>> .init = its_enable_cavium_thunderx,
>>
>
> Thanks,
>
> M.
> --
> Jazz is not dead. It just smells funny...

thanks
Ganapat


>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/