Re: [RFC PATCH 07/86] Revert "livepatch,sched: Add livepatch task switching to cond_resched()"
From: Josh Poimboeuf
Date: Thu Nov 09 2023 - 18:47:43 EST
On Thu, Nov 09, 2023 at 02:50:48PM -0800, Ankur Arora wrote:
> >> I guess I'm not fully understanding what the cond rescheds are for. But
> >> would an IPI to all CPUs setting NEED_RESCHED, fix it?
>
> Yeah. We could just temporarily toggle to full preemption, when
> NEED_RESCHED_LAZY is always upgraded to NEED_RESCHED which will
> then send IPIs.
>
> > If all livepatch arches had the ORC unwinder, yes.
> >
> > The problem is that frame pointer (and similar) unwinders can't reliably
> > unwind past an interrupt frame.
>
> Ah, I wonder if we could just disable the preempt_schedule_irq() path
> temporarily? Hooking into schedule() alongside something like this:
>
> @@ -379,7 +379,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
>
> void irqentry_exit_cond_resched(void)
> {
> - if (!preempt_count()) {
> + if (klp_cond_resched_disable() && !preempt_count()) {
>
> The problem would be tasks that don't go through any preemptible
> sections.
Let me back up a bit and explain what klp is trying to do.
When a livepatch is applied, klp needs to unwind all the tasks,
preferably within a reasonable amount of time.
We can't unwind task A from task B while task A is running, since task A
could be changing the stack during the unwind. So task A needs to be
blocked or asleep. The only exception to that is if the unwind happens
in the context of task A itself.
The problem we were seeing was CPU-bound kthreads (e.g., vhost_worker)
not getting patched within a reasonable amount of time. We fixed it by
hooking the klp unwind into cond_resched() so it can unwind from the
task itself.
It only worked because we had a non-preempted hook (because non-ORC
unwinders can't unwind reliably through preemption) which called klp to
unwind from the context of the task.
Without something to hook into, we have a problem. We could of course
hook into schedule(), but if the kthread never calls schedule() from a
non-preempted context then it still doesn't help.
--
Josh