Re: [PATCH 1/3] sched: add sched_task_call()

From: Josh Poimboeuf
Date: Fri Feb 20 2015 - 12:06:22 EST

On Fri, Feb 20, 2015 at 09:49:32AM +0100, Jiri Kosina wrote:
> Alright, so to sum it up:
> - current stack dumping (even looking at /proc/<pid>/stack) is not
> guaranteed to yield "correct" results in case the task is running at the
> time the stack is being examined
> - the only fool-proof way is to send IPI-NMI to all CPUs, and synchronize
> the handlers between each other (to make sure that reschedule doesn't
> happen in between on some CPU and other task doesn't start running in
> the interim).
> The NMI handler dumps its current stack in case it's running in context
> of the process whose stack is to be dumped. Otherwise, one of the NMI
> handlers looks up the required task_struct, and dumps it if it's not
> running on any CPU
> - For live patching use-case, the stack has to be analyzed (and decision
> on what to do based on the analysis) in the NMI handler itself,
> otherwise it gets racy again
> Converting /proc/<pid>/stack to this mechanism seems like a correct thing
> to do in any case, as it's slow path anyway.
> The original intent seemed to have been to make this fast path for the
> live patching case, but that's probably not possible, so it seems like the
> price that will have to be paid for being able to finish live-patching of
> CPU-bound processess is the cost of IPI-NMI broadcast.

Hm, syncing IPI's among CPUs sounds pretty disruptive.

This is really two different issues, so I'll separate them:

1. /proc/pid/stack for running tasks

I haven't heard anybody demanding that /proc/<pid>/stack should actually
print the stack for running tasks. My suggestion was just that we avoid
the possibility of printing garbage.

Today's behavior for a running task is (usually):

# cat /proc/802/stack
[<ffffffffffffffff>] 0xffffffffffffffff

How about, when we detecting a running task, just always show that?
That would give us today's behavior, except without occasionally
printing garbage, while avoiding all the overhead of syncing IPI's.

2. live patching of running tasks

I don't see why we would need to sync IPI's to patch CPU-bound
processes. Why not use context tracking or the TIF_USERSPACE flag like
I mentioned before?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at