Re: [PATCH 08/12] x86/dumpstack: Pin the target stack in save_stack_trace_tsk()

From: Ingo Molnar
Date: Thu Sep 15 2016 - 02:38:19 EST

* Andy Lutomirski <luto@xxxxxxxxxx> wrote:

> This will prevent a crash if the target task dies before or while
> dumping its stack once we start freeing task stacks early.
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> ---
> arch/x86/kernel/stacktrace.c | 5 +++++
> 1 file changed, 5 insertions(+)
> diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c
> index 4738f5e0f2ab..b3f32fbe3ba4 100644
> --- a/arch/x86/kernel/stacktrace.c
> +++ b/arch/x86/kernel/stacktrace.c
> @@ -79,9 +79,14 @@ void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace)
> void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace)
> {
> + if (!try_get_task_stack(tsk))
> + return;
> +
> dump_trace(tsk, NULL, NULL, 0, &save_stack_ops_nosched, trace);
> if (trace->nr_entries < trace->max_entries)
> trace->entries[trace->nr_entries++] = ULONG_MAX;
> +
> + put_task_stack(tsk);
> }
> EXPORT_SYMBOL_GPL(save_stack_trace_tsk);

So I very much like the first half of the series, the thread_info merge into
task_struct (yay!) and I am in the process of applying those bits to -tip.

So the above not (yet) fully correct patch is in preparation to a later change you
are doing in this series:

[PATCH 11/12] sched: Free the stack early if CONFIG_THREAD_INFO_IN_TASK

But I am having serious second thoughts about the fact that we now have to sprinke
non-obvious try_get_task_stack()/put_task_stack() pairs into debugging code that
never really needed anything like that, which pairs are easy to get wrong, easy to
miss - and generally if we miss them will it will result in a slightly buggy, very
unpleasant, racy, crashy behavior of the kernel.

Can we just ... not do the delayed freeing?

I mean, the cache hotness arguments don't look overly convincing to me: if we just
start not using a piece of formerly cache hot memory then those cache lines will
eventually decay naturally in whatever LRU scheme the CPU is using and will be
reused without much harm done. It's just "another day in CPU land" that happens
all the time. Yes, we could reduce the latency of the freeing and thus slightly
improve the locality of other bits ... but is it really measurable or noticeable
in any fashion. It's like task/thread fork/clone/exit is _that_ much of a fast

And then there's also the whole CONFIG_THREAD_INFO_IN_TASK #ifdeffery spreading.

I.e I just don't see the justification for this kind of object life time assymetry
and hard to debug fragility.

Am I missing something?