Re: [GIT pull] sched/core for v5.16-rc1

From: Peter Zijlstra
Date: Tue Nov 02 2021 - 04:41:40 EST


On Mon, Nov 01, 2021 at 02:27:49PM -0700, Linus Torvalds wrote:
> On Mon, Nov 1, 2021 at 2:01 PM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Unwinders that need locks because they can do bad things if they are
> > working on unstable data are EVIL and WRONG.
>
> Note that this is fundamental: if you can fool an unwider to do
> something bad just because the data isn't stable, then the unwinder is
> truly horrendously buggy, and not usable.

>From what I've been led to believe, quite a few of our arch unwinders
seem to fall in that category. They're mostly only happy when unwinding
self and don't have many guardrails on otherwise.

> It could be a user process doing bad things to the user stack frame
> from another thread when profiling is enabled.

Most of the unwinders seem to only care about the kernel stack. Not the
user stack.

> It could be debug code unwinding without locks for random reasons.
>
> So I really don't like "take a lock for unwinding". It's a pretty bad
> bug if the lock required.

Fair enough; te x86 unwinder is pretty robust in this regard, but it
seems to be one of few :/

> The "Link" in the commit also is entirely useless, pointing back to
> the emailed submission of the patch, rather than any useful discussion
> about why the patch happened.

So the initial discussion started here:

https://lkml.kernel.org/r/20210923233105.4045080-1-keescook@xxxxxxxxxxxx

A later thread that might also be of interest is:

https://lkml.kernel.org/r/YWgyy+KvNLQ7eMIV@xxxxxxxxxxxxxxxxxxxxx

Also, an even later thread proposes to push that lock into more stack
unwinding functions (anything doing remote unwinds):

https://lkml.kernel.org/r/20211022150933.883959987@xxxxxxxxxxxxx

But it seems to be you're thinking that's fundamentally buggered and
people should instead invest in fixing their unwinders already.


Now, as is, this stuff is user exposed through /proc/$pid/{wchan,stack}
and as such I think it *can* do with a few extra guardrails in generic
code. OTOH, /proc/$pid/stack is root only.

Also, the remote stack-trace code is hooked into bpf (because
kitchen-sink) and while I didn't look too hard, I can imagine it could
be used to trigger crashes on our less robust architectures if prodded
just right.

Should I care about all this from a generic code PoV, or simply let the
architectures that got it 'wrong' deal with it?