Re: [PATCH] livepatch: Avoid CPU hogging with cond_resched

From: David Vernet
Date: Mon Jan 10 2022 - 09:39:05 EST


Apologies all for the delayed response -- I was still on holiday last week.

Petr Mladek <pmladek@xxxxxxxx> wrote on Mon [2022-Jan-03 17:04:31 +0100]:
> > > It turns out that symbol lookups often take up the most CPU time when
> > > enabling and disabling a patch, and may hog the CPU and cause other tasks
> > > on that CPU's runqueue to starve -- even in paths where interrupts are
> > > enabled. For example, under certain workloads, enabling a KLP patch with
> > > many objects or functions may cause ksoftirqd to be starved, and thus for
> ^^^^^^^^^^^^^^^^^^^^^^^^^
> This suggests that a single kallsyms_on_each_symbol() is not a big
> problem. cond_resched() might be called non-necessarily often there.
> I wonder if it would be enough to add cond_resched() into the two
> loops calling klp_find_object_symbol().

In the initial version of the patch I was intending to send out, I actually
had the cond_resched() in klp_find_object_symbol(). Having it there did
appear to fix the ksoftirqd starvation issue, but I elected to put it in
klp_find_object_symbol() after Chris (cc'd) suggested it because
cond_resched() is so lightweight, and it didn't affect the runtime for
livepatching in my experiments.

> That said, kallsyms_on_each_symbol() is a slow path and there might
> be many symbols. So, it might be the right place.

Yes, my thinking was that because it didn't seem to affect throughput, and
because it would could potentially cause the same ssue to occur if it were
ever called elsewhere, that this was the correct place for it.