Re: live patching design (was: Re: [PATCH 1/3] sched: add sched_task_call())
From: Ingo Molnar
Date: Sat Feb 21 2015 - 13:30:16 EST
* Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> On Fri, Feb 20, 2015 at 10:46:13PM +0100, Vojtech Pavlik wrote:
> > On Fri, Feb 20, 2015 at 08:49:01PM +0100, Ingo Molnar wrote:
> >
> > > I.e. it's in essence the strong stop-all atomic
> > > patching model of 'kpatch', combined with the
> > > reliable avoidance of kernel stacks that 'kgraft'
> > > uses.
> >
> > > That should be the starting point, because it's the
> > > most reliable method.
> >
> > In the consistency models discussion, this was marked
> > the "LEAVE_KERNEL+SWITCH_KERNEL" model. It's indeed the
> > strongest model of all, but also comes at the highest
> > cost in terms of impact on running tasks. It's so high
> > (the interruption may be seconds or more) that it was
> > deemed not worth implementing.
>
> Yeah, this is way too disruptive to the user.
>
> Even the comparatively tiny latency caused by kpatch's
> use of stop_machine() was considered unacceptable by
> some.
Unreliable, unrobust patching is even more disruptive...
What I think makes it long term fragile is that we combine
two unrobust, unlikely mechanisms: the chance that a task
just happens to execute a patched function, with the chance
that debug information is unreliable.
For example tracing patching got debugged to a fair degree
because we rely on the patching for actual tracing
functionality. Even with that relatively robust usage model
we had our crises ...
I just don't see how a stack backtrace based live patching
method can become robust in the long run.
> Plus a lot of processes would see EINTR, causing more
> havoc.
Parking threads safely in user mode does not require the
propagation of syscall interruption to user-space.
(It does have some other requirements, such as making all
syscalls interruptible to a 'special' signalling method
that only live patching triggers - even syscalls that are
under the normal ABI uninterruptible, such as sys_sync().)
On the other hand, if it's too slow, people will work on
improving signal propagation latencies: making syscalls
more readily interruptible and more seemlessly restartable
has various other advantages beyond live kernel patching.
I.e. it's a win-win scenario and will improve various areas
of the kernel in terms of syscall interruptability
latencies.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/