Re: [PATCH 1/3] sched: add sched_task_call()
From: Peter Zijlstra
Date: Tue Feb 17 2015 - 13:15:54 EST
On Tue, Feb 17, 2015 at 08:12:11AM -0600, Josh Poimboeuf wrote:
> On Tue, Feb 17, 2015 at 10:24:50AM +0100, Peter Zijlstra wrote:
> > So far stack unwinding has basically been a best effort debug output
> > kind of thing, you're wanting to make the integrity of the kernel depend
> > on it.
> > You require an absolute 100% correctness of the stack unwinder -- where
> > today it is; as stated above; a best effort debug output thing.
> > That is a _big_ change.
> True, this does seem to be the first attempt to rely on the correctness
> of the stack walking code.
> > Has this been properly considered; has all the asm of the relevant
> > architectures been audited? Are you planning on maintaining that level
> > of audit for all new patches?
> I agree, the unwinder needs to be 100% correct. Currently we only
> support x86_64. Luckily, in general, stacks are very well defined and
> walking the stack of a sleeping task should be straightforward. I don't
> think it would be too hard to ensure the stack unwinder is right for
> other architectures as we port them.
I would not be too sure about that, the x86 framepointer thing is not
universal. IIRC some have to have some form of in-kernel dwarf
And I'm assuming you're hard relying on CONFIG_FRAMEPOINTER here,
because otherwise x86 stacks are a mess too.
So even with CONFIG_FRAMEPOINTER, things like copy_to/from_user, which
are implemented in asm, don't honour that. So if one of those faults and
the fault handler sleeps, you'll miss say your
Then again, those asm functions don't have function trace bits either,
so you can't patch them to begin with I suppose.
Here's to hoping you don't have to..
> > Because the way you propose to do things, we'll end up with silent but
> > deadly fail if the unwinder is less than 100% correct. No way to easily
> > debug that, no warns, just silent corruption.
> That's a good point. There's definitely room for additional error
> checking in the x86 stack walking code. A couple of ideas:
> - make sure it starts from a __schedule() call at the top
> - make sure we actually reach the bottom of the stack
> - make sure each stack frame's return address is immediately after a
> call instruction
> If we hit any of those errors, we can bail out, unregister with ftrace
> and restore the system to its original state.
And then hope you can get a better trace next time around? Or will you
fall-back to an alternative method of patching?
> > Are you really really sure you want to go do this?
> Basically, yes. We've had a lot of conversations about many different
> variations of how to do this over the past year, and this is by far the
> best approach we've come up with.
For some reason I'm thinking jikos is going to disagree with you on that
I'm further thinking we don't actually need 2 (or more) different means
of live patching in the kernel. So you all had better sit down (again)
and come up with something you all agree on.
> FWIW, we've been doing something similar with kpatch and stop_machine()
> over the last 1+ years, and have run into zero problems with that
Could be you've just been lucky...
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/