Re: [RFC PATCH 0/2] kpatch: dynamic kernel patching

From: Ingo Molnar
Date: Mon May 05 2014 - 14:43:15 EST



* Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:

> On Mon, May 05, 2014 at 08:26:38AM -0500, Josh Poimboeuf wrote:
> > On Mon, May 05, 2014 at 10:55:37AM +0200, Ingo Molnar wrote:
> > >
> > > * Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> > >
> > > > [...]
> > > >
> > > > kpatch checks the backtraces of all tasks in stop_machine() to
> > > > ensure that no instances of the old function are running when the
> > > > new function is applied. I think the biggest downside of this
> > > > approach is that stop_machine() has to idle all other CPUs during
> > > > the patching process, so it inserts a small amount of latency (a few
> > > > ms on an idle system).
> > >
> > > When live patching the kernel, how about achieving an even 'cleaner'
> > > state for all tasks in the system: to freeze all tasks, as the suspend
> > > and hibernation code (and kexec) does, via freeze_processes()?
> > >
> > > That means no tasks in the system have any real kernel execution
> > > state, and there's also no problem with long-sleeping tasks, as
> > > freeze_processes() is supposed to be fast as well.
> > >
> > > I.e. go for the most conservative live patching state first, and relax
> > > it only once the initial model is upstream and is working robustly.
> >
> > I had considered doing this before, but the problem I found is
> > that many kernel threads are unfreezable. So we wouldn't be able
> > to check whether its safe to replace any functions in use by those
> > kernel threads.
>
> OTOH many kernel threads are parkable. Which achieves kind of
> similar desired behaviour: the kernel threads then aren't running.
>
> And in fact we could implement freezing on top of park for kthreads.
>
> But unfortunately there are still quite some of them which don't
> support parking.

Well, if distros are moving towards live patching (and they are!),
then it looks rather necessary to me that something scary as flipping
out live kernel instructions with substantially different code should
be as safe as possible, and only then fast.

If a kernel refuses to patch with certain threads running, that will
drive those kernel threads being fixed and such. It's a deterministic,
recoverable, reportable bug situation, so fixing it should be fast.

We learned these robustness lessons the hard way with kprobes and
ftrace dynamic code patching... which are utterly simple compared to
live kernel patching!

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/