Re: [PATCH 1/3] sched: add sched_task_call()

From: Josh Poimboeuf
Date: Fri Feb 20 2015 - 11:12:57 EST

On Fri, Feb 20, 2015 at 10:50:03AM +0100, Ingo Molnar wrote:
> * Jiri Kosina <jkosina@xxxxxxx> wrote:
> > Alright, so to sum it up:
> >
> > - current stack dumping (even looking at /proc/<pid>/stack) is not
> > guaranteed to yield "correct" results in case the task is running at the
> > time the stack is being examined
> Don't even _think_ about trying to base something as
> dangerous as live patching the kernel image on the concept
> of:
> 'We can make all stack backtraces reliably correct all
> the time, with no false positives, with no false
> negatives, 100% of the time, and quickly discover and
> fix bugs in that'.
> It's not going to happen:
> - The correctness of stacktraces partially depends on
> tooling and we don't control those.

What tooling are you referring to?

Sure, we rely on the compiler to produce the correct assembly for
putting frame pointers on the stack. But that's pretty straightforward.
The kernel already relies on the compiler to do plenty of things which
are much more complex than that.

> - More importantly, there's no strong force that ensures
> we can rely on stack backtraces: correcting bad stack
> traces depends on people hitting those functions and
> situations that generate them, seeing a bad stack trace,
> noticing that it's weird and correcting whatever code or
> tooling quirk causes the stack entry to be incorrect.
> Essentially unlike other kernel code which breaks stuff if
> it's incorrect, there's no _functional_ dependence on stack
> traces, so live patching would be the first (and pretty
> much only) thing that breaks on bad stack traces ...

First, there are several things we do to avoid anomalies:

- we don't patch asm functions
- we only walk the stack of sleeping tasks

Second, currently the stack unwinding code is rather crude and doesn't
do much in the way of error handling. There are several error
conditions we could easily check for programmatically:

- make sure it starts from a __schedule() call at the top (I've only
proposed walking the stacks of sleeping tasks)
- make sure we actually reach the bottom of the stack
- make sure each stack frame's return address is immediately after a
call instruction

If any of these checks fail, we can either delay the patching of the
task until later or we can cancel the patching process, with no harm
done. Either way we can WARN the user so that we get reports of these

> If you think you can make something like dwarf annotations
> work reliably to base kernel live patching on that,
> reconsider.

No, we would rely on CONFIG_FRAME_POINTER.

> Even with frame pointer backtraces can go bad sometimes, I
> wouldn't base live patching even on _that_,

Other than hand-coded assembly, can you provide more details about how
frame pointer stack traces can go bad?

> and that's a very simple concept with a performance cost that most
> distros don't want to pay.

Hm, CONFIG_FRAME_POINTER is enabled on all the distros I use.

> So if your design is based on being able to discover 'live'
> functions in the kernel stack dump of all tasks in the
> system, I think you need a serious reboot of the whole
> approach and get rid of that fragility before any of that
> functionality gets upstream!

Just to clarify a couple of things:

1. Despite the email subject, this discussion is really about another
RFC patch set [1]. It hasn't been merged, and there's still a lot of
ongoing discussion about it.

2. We don't dump the stack of _all_ tasks. Only sleeping ones.


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at