Re: [PATCH 0/3] warn and suppress irqflood

From: Pingfan Liu
Date: Thu Oct 29 2020 - 03:41:57 EST


On Wed, Oct 28, 2020 at 12:58:41PM +0100, Thomas Gleixner wrote:
> On Wed, Oct 28 2020 at 14:02, Pingfan Liu wrote:
> > On Tue, Oct 27, 2020 at 3:59 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> Also Liu's patch only works if:
> >>
> >> 1) CONFIG_IRQ_TIME_ACCOUNTING is enabled
> >
> > I wonder whether it can not be a default option or not by the following method:
> > DEFINE_STATIC_KEY_FALSE(irqtime_account), and enable it according to
> > a boot param.
>
> How so?
>
> config IRQ_TIME_ACCOUNTING
> depends on HAVE_IRQ_TIME_ACCOUNTING && !VIRT_CPU_ACCOUNTING_NATIVE
>
Look closely at the two config value:
-1. HAVE_IRQ_TIME_ACCOUNTING, it is selected by most of the popular arches, and
can be further relaxed.
It implies sched_clock() is fast enough for sampling. With current code, the
variable sched_clock_irqtime=0 can be used to turn off irqtime accounting on
some arches with slow sched_clock(). And it can be even better by using
DEFINE_STATIC_KEY_FALSE(sched_clock_irqtime)
So the pre-requirement can be relaxed as "depends on !VIRT_CPU_ACCOUNTING_NATIVE"
In case that I can not express clearly, could you have a look at the demo patch?

That patch _assumes_ that irqtime accounting costs much and is not turned on by
default. If turned on, it will cost an extra jmp than current implement.
And I think it is critical to my [1/3] whether this assumption is reasonable.

-2. For VIRT_CPU_ACCOUNTING_NATIVE, it can only be selected by powerpc and ia64

In fact, I have a seperate patch for powerpc with
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE to utilize my [1/3].

---
diff --git a/init/Kconfig b/init/Kconfig
index c944691..16d168b 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -490,7 +490,7 @@ endchoice

config IRQ_TIME_ACCOUNTING
bool "Fine granularity task level IRQ time accounting"
- depends on HAVE_IRQ_TIME_ACCOUNTING && !VIRT_CPU_ACCOUNTING_NATIVE
+ depends on !VIRT_CPU_ACCOUNTING_NATIVE
help
Select this option to enable fine granularity task irq time
accounting. This is done by reading a timestamp on each
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5a55d23..3ab7e1d 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -19,7 +19,7 @@
*/
DEFINE_PER_CPU(struct irqtime, cpu_irqtime);

-static int sched_clock_irqtime;
+DEFINE_STATIC_KEY_FALSE(sched_clock_irqtime);

void enable_sched_clock_irqtime(void)
{
@@ -49,13 +49,14 @@ static void irqtime_account_delta(struct irqtime *irqtime, u64 delta,
*/
void irqtime_account_irq(struct task_struct *curr)
{
- struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
+ struct irqtime *irqtime;
s64 delta;
int cpu;

- if (!sched_clock_irqtime)
+ if (static_branch_unlikely(&sched_clock_irqtime))
return;

+ irqtime = this_cpu_ptr(&cpu_irqtime);
cpu = smp_processor_id();
delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
irqtime->irq_start_time += delta;
@@ -84,16 +85,7 @@ static u64 irqtime_tick_accounted(u64 maxtime)
return delta;
}

-#else /* CONFIG_IRQ_TIME_ACCOUNTING */
-
-#define sched_clock_irqtime (0)
-
-static u64 irqtime_tick_accounted(u64 dummy)
-{
- return 0;
-}
-
-#endif /* !CONFIG_IRQ_TIME_ACCOUNTING */
+#endif

static inline void task_group_account_field(struct task_struct *p, int index,
u64 tmp)
@@ -475,7 +467,7 @@ void account_process_tick(struct task_struct *p, int user_tick)
if (vtime_accounting_enabled_this_cpu())
return;

- if (sched_clock_irqtime) {
+ if (static_branch_unlikely(&sched_clock_irqtime))
irqtime_account_process_tick(p, user_tick, 1);
return;
}
@@ -504,7 +496,7 @@ void account_idle_ticks(unsigned long ticks)
{
u64 cputime, steal;

- if (sched_clock_irqtime) {
+ if (static_branch_unlikely(&sched_clock_irqtime))
irqtime_account_idle_ticks(ticks);
return;
}
--
2.7.5

[...]
> +
> +static int __init irqstorm_setup(char *arg)
> +{
> + int res = kstrtoul(arg, 0, &irqstorm_limit);
> +
> + if (!res) {
> + pr_info("Interrupt storm detector enabled. Limit=%lu / s\n",
> + irqstorm_limit);
> + }
> + return !!res;
> +}
> +__setup("irqstorm_limit", irqstorm_setup);

This configuration independent method looks appealing. And I am glad to have a try.

But irqstorm_limit may be a hard choice. Maybe by formula:
instruction-percpu-per-second / insn num of irq failed path ? It is hard to
estimate "instruction-percpu-per-second".

Thanks,
Pingfan