Re: Perf hotplug lockup in v4.9-rc8
From: Peter Zijlstra
Date: Wed Dec 07 2016 - 13:35:05 EST
On Wed, Dec 07, 2016 at 05:53:47PM +0000, Mark Rutland wrote:
> On Wed, Dec 07, 2016 at 01:52:17PM +0000, Mark Rutland wrote:
> > Hi all
> >
> > Jeremy noticed a kernel lockup on arm64 when the perf tool was used in
> > parallel with hotplug, which I've reproduced on arm64 and x86(-64) with
> > v4.9-rc8. In both cases I'm using defconfig; I've tried enabling lockdep
> > but it was silent for arm64 and x86.
>
> It looks like we're trying to install a task-bound event into a context
> where task_cpu(ctx->task) is dead, and thus the cpu_function_call() in
> perf_install_in_context() fails. We retry repeatedly.
>
> On !PREEMPT (as with x86 defconfig), we manage to prevent the hotplug
> machinery from making progress, and this turns into a livelock.
>
> On PREEMPT (as with arm64 defconfig), I'm somewhat lost.
So the problem is that even with PREEMPT we can hit a blocked task
that has a 'dead' cpu.
We'll spin until either the task wakes up or the CPU does, either can
take a very long time.
How exactly your test-case triggers this, all it executes is 'true' and
that really shouldn't block much, is a mystery still.
In any case, the below cures things, but it is, as the comment says,
horrific.
I'm yet to come up with a better patch that doesn't violate scheduler
internals.
---
kernel/events/core.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6ee1febdf6ff..6faf4b03396e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1203,6 +1203,11 @@ static void put_ctx(struct perf_event_context *ctx)
* perf_event_context::lock
* perf_event::mmap_mutex
* mmap_sem
+ *
+ * task_struct::pi_lock
+ * rq::lock
+ * perf_event_context::lock
+ *
*/
static struct perf_event_context *
perf_event_ctx_lock_nested(struct perf_event *event, int nesting)
@@ -2352,6 +2357,28 @@ perf_install_in_context(struct perf_event_context *ctx,
return;
}
raw_spin_unlock_irq(&ctx->lock);
+
+ raw_spin_lock_irq(&task->pi_lock);
+ if (!(task->state == TASK_RUNNING || task->state == TASK_WAKING)) {
+ /*
+ * XXX horrific hack...
+ */
+ raw_spin_lock(&ctx->lock);
+ if (task != ctx->task) {
+ raw_spin_unlock(&ctx->lock);
+ raw_spin_unlock_irq(&task->pi_lock);
+ goto again;
+ }
+
+ add_event_to_ctx(event, ctx);
+ raw_spin_unlock(&ctx->lock);
+ raw_spin_unlock_irq(&task->pi_lock);
+ return;
+ }
+ raw_spin_unlock_irq(&task->pi_lock);
+
+ cond_resched();
+
/*
* Since !ctx->is_active doesn't mean anything, we must IPI
* unconditionally.