live kernel upgrades (was: live kernel patching design)

From: Ingo Molnar
Date: Sun Feb 22 2015 - 04:46:52 EST

Next message: Jonathan Cameron: "Re: [PATCH v2] iio: common: ssp_sensors: Protect PM-only functions to kill warning"
Previous message: Michael S. Tsirkin: "Re: [PATCH 3/3] vhost_net: fix virtio_net header endianness"
In reply to: Jiri Kosina: "Re: live patching design (was: Re: [PATCH 1/3] sched: add sched_task_call())"
Next in thread: Ingo Molnar: "Re: live kernel upgrades (was: live kernel patching design)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Ingo Molnar <mingo@xxxxxxxxxx> wrote:

> Anyway, let me try to reboot this discussion back to
> technological details by summing up my arguments in
> another mail.

So here's how I see the kGraft and kpatch series. To not
put too fine a point on it, I think they are fundamentally
misguided in both implementation and in design, which turns
them into an (unwilling) extended arm of the security
theater:

- kGraft creates a 'mixed' state where old kernel
functions and new kernel functions are allowed to
co-exist, furthermore there's no guarantee currently at
attempting to get the patching done within a bound
amount of time.

- kpatch uses kernel stack backtraces to determine whether
a task is executing a function or not - which IMO is
fundamentally fragile as kernel stack backtraces are
'debug info' and are maintained and created as such:
we've had long lasting stack backtrace bugs which would
now be turned into 'potentially patching a live
function' type of functional (and hard to debug) bugs.
I didn't see much effort that tries to turn this
equation around and makes kernel stacktraces more
robust.

- the whole 'consistency model' talk both projects employ
reminds me of how we grew 'security modules': where
people running various mediocre projects would in the
end not seek to create a superior upstream project, but
would seek the 'consensus' in the form of cross-acking
each others' patches as long as their own code got
upstream as well ...

I'm not blaming Linus for giving in to allowing security
modules: they might be the right model for such a hard
to define and in good part psychological discipline as
'security', but I sure don't see the necessity of doing
that for 'live kernel patching'.

More importantly, both kGraft and kpatch are pretty limited
in what kinds of updates they allow, and neither kGraft nor
kpatch has any clear path towards applying more complex
fixes to kernel images that I can see: kGraft can only
apply the simplest of fixes where both versions of a
function are interchangeable, and kpatch is only marginally
better at that - and that's pretty fundamental to both
projects!

I think all of these problems could be resolved by shooting
for the moon instead:

- work towards allowing arbitrary live kernel upgrades!

not just 'live kernel patches'.

Work towards the goal of full live kernel upgrades between
any two versions of a kernel that supports live kernel
upgrades (and that doesn't have fatal bugs in the kernel
upgrade support code requiring a hard system restart).

Arbitrary live kernel upgrades could be achieved by
starting with the 'simple method' I outlined in earlier
mails, using some of the methods that kpatch and kGraft are
both utilizing or planning to utilize:

- implement user task and kthread parking to get the
kernel into quiescent state.

- implement (optional, thus ABI-compatible)
system call interruptability and restartability
support.

- implement task state and (limited) device state
snapshotting support

- implement live kernel upgrades by:

- snapshotting all system state transparently

- fast-rebooting into the new kernel image without
shutting down and rebooting user-space, i.e. _much_
faster than a regular reboot.

- restoring system state transparently within the new
kernel image and resuming system workloads where
they were left.

Even complex external state like TCP socket state and
graphics state can be preserved over an upgrade. As far as
the user is concerned, nothing happened but a brief pause -
and he's now running a v3.21 kernel, not v3.20.

Obviously one of the simplest utilizations of live kernel
upgrades would be to apply simple security fixes to
production systems. But that's just a very simple
application of a much broader capability.

Note that if done right, then the time to perform a live
kernel upgrade on a typical system could be brought to well
below 10 seconds system stoppage time: adequate to the vast
majority of installations.

For special installations or well optimized hardware the
latency could possibly be brought below 1 second stoppage
time.

This 'live kernel upgrades' approach would have various
advantages:

- it brings together various principles working towards
shared goals:

- the boot time reduction folks
- the checkpoint/restore folks
- the hibernation folks
- the suspend/resume and power management folks
- the live patching folks (you)
- the syscall latency reduction folks

if so many disciplines are working together then maybe
something really good and long term maintainble can
crystalize out of that effort.

- it ignores the security theater that treats security
fixes as a separate, disproportionally more important
class of fixes and instead allows arbitrary complex
changes over live kernel upgrades.

- there's no need to 'engineer' live patches separately,
there's no need to review them and their usage sites
for live patching relevant side effects. Just create a
'better' kernel as defined by users of that kernel:

- in the enterprise distro space create a more stable
kernel and allow transparent upgrades into it.

- in the desktop distro space create a kernel that
will contain fixes and support for latest hardware.

- etc.

there's the need to engineer c/r and device state
support, but that's a much more concentrated and
specific field with many usecases beyond live
kernel upgrades.

We have many of the building blocks in place and have them
available:

- the freezer code already attempts at parking/unparking
threads transparently, that could be fixed/extended.

- hibernation, regular suspend/resume and in general
power management has in essence already implemented
most building blocks needed to enumerate and
checkpoint/restore device state that otherwise gets
lost in a shutdown/reboot cycle.

- c/r patches started user state enumeration and
checkpoint/restore logic

A feature like arbitrary live kernel upgrades would be well
worth the pain and would be worth the complications, and
it's actually very feasible technically.

The goals of the current live kernel patching projects,
"being able to apply only the simplest of live patches",
which would in my opinion mostly serve the security
theater? They are not forward looking enough, and in that
sense they could even be counterproductive.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jonathan Cameron: "Re: [PATCH v2] iio: common: ssp_sensors: Protect PM-only functions to kill warning"
Previous message: Michael S. Tsirkin: "Re: [PATCH 3/3] vhost_net: fix virtio_net header endianness"
In reply to: Jiri Kosina: "Re: live patching design (was: Re: [PATCH 1/3] sched: add sched_task_call())"
Next in thread: Ingo Molnar: "Re: live kernel upgrades (was: live kernel patching design)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]