Re: Kernel-managed IRQ affinity (cont)

From: Peter Xu
Date: Thu Dec 19 2019 - 13:09:25 EST


On Fri, Dec 20, 2019 at 12:11:15AM +0800, Ming Lei wrote:
> OK, please try the following patch:
>
>
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index 6c8512d3be88..0fbcbacd1b29 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -13,6 +13,7 @@ enum hk_flags {
> HK_FLAG_TICK = (1 << 4),
> HK_FLAG_DOMAIN = (1 << 5),
> HK_FLAG_WQ = (1 << 6),
> + HK_FLAG_MANAGED_IRQ = (1 << 7),
> };
>
> #ifdef CONFIG_CPU_ISOLATION
> diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
> index 1753486b440c..0a75a09cc4e8 100644
> --- a/kernel/irq/manage.c
> +++ b/kernel/irq/manage.c
> @@ -20,6 +20,7 @@
> #include <linux/sched/task.h>
> #include <uapi/linux/sched/types.h>
> #include <linux/task_work.h>
> +#include <linux/sched/isolation.h>
>
> #include "internals.h"
>
> @@ -212,12 +213,33 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
> {
> struct irq_desc *desc = irq_data_to_desc(data);
> struct irq_chip *chip = irq_data_get_irq_chip(data);
> + const struct cpumask *housekeeping_mask =
> + housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
> int ret;
> + cpumask_var_t tmp_mask;
>
> if (!chip || !chip->irq_set_affinity)
> return -EINVAL;
>
> - ret = chip->irq_set_affinity(data, mask, force);
> + if (!zalloc_cpumask_var(&tmp_mask, GFP_KERNEL))
> + return -EINVAL;
> +
> + /*
> + * Userspace can't change managed irq's affinity, make sure
> + * that isolated CPU won't be selected as the effective CPU
> + * if this irq's affinity includes both isolated CPU and
> + * housekeeping CPU.
> + *
> + * This way guarantees that isolated CPU won't be interrupted
> + * by IO submitted from housekeeping CPU.
> + */
> + if (irqd_affinity_is_managed(data) &&
> + cpumask_intersects(mask, housekeeping_mask))
> + cpumask_and(tmp_mask, mask, housekeeping_mask);
> + else
> + cpumask_copy(tmp_mask, mask);
> +
> + ret = chip->irq_set_affinity(data, tmp_mask, force);
> switch (ret) {
> case IRQ_SET_MASK_OK:
> case IRQ_SET_MASK_OK_DONE:
> @@ -229,6 +251,8 @@ int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
> ret = 0;
> }
>
> + free_cpumask_var(tmp_mask);
> +
> return ret;
> }
>
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 9fcb2a695a41..008d6ac2342b 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -163,6 +163,12 @@ static int __init housekeeping_isolcpus_setup(char *str)
> continue;
> }
>
> + if (!strncmp(str, "managed_irq,", 12)) {
> + str += 12;
> + flags |= HK_FLAG_MANAGED_IRQ;
> + continue;
> + }
> +
> pr_warn("isolcpus: Error, unknown flag\n");
> return 0;
> }

Thanks for the quick patch. I'll test after my current round of tests
finish and update. I'll probably believe this will work for us as
long as it "functionally" works :) (after all it won't even need a RT
environment because it's really about where to put some IRQs). So
IMHO the more important thing is whether such a solution could be
acceptable by the upstream.

Thanks,

--
Peter Xu