Re: use-after-free in __perf_install_in_context

From: Dmitry Vyukov
Date: Tue Dec 15 2015 - 08:09:40 EST


On Thu, Dec 10, 2015 at 8:57 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Tue, Dec 08, 2015 at 08:14:58PM +0100, Dmitry Vyukov wrote:
>> Tested with your patches.
>> The additional WARNING does not fire.
>> For the rcu stacks, I had to change two more 2's to TRACK_NR and also
>> moved memorization from call_rcu to __call_rcu, but now it is working.
>> Two reports with indirect stack:
>
> Ah nice, and sorry for the oversights. Obviously one can add this
> indirect marker to __queue_work() as well to also cover the workqueue
> offloading some other sites do.
>
>
>> [<ffffffff81845bd4>] __asan_report_load8_noabort+0x54/0x70 mm/kasan/report.c:295
>> [<ffffffff814b25f9>] __lock_acquire+0x4e99/0x5100 kernel/locking/lockdep.c:3092
>> [<ffffffff814b517d>] lock_acquire+0x19d/0x3f0 kernel/locking/lockdep.c:3585
>> [< inline >] __raw_spin_lock include/linux/spinlock_api_smp.h:144
>> [<ffffffff86a89591>] _raw_spin_lock+0x31/0x40 kernel/locking/spinlock.c:151
>> [< inline >] perf_ctx_lock kernel/events/core.c:351
>
> So this here takes locks on cpuctx->ctx.lock and cpuctx->task_ctx->lock
> and it is the latter that is pointing out into space.
>
>> [<ffffffff816e15d9>] __perf_install_in_context+0x109/0xa00 kernel/events/core.c:2083
>> [<ffffffff816cb7ba>] remote_function+0x14a/0x200 kernel/events/core.c:74
>> [<ffffffff81572787>] generic_exec_single+0x2a7/0x490 kernel/smp.c:156
>> [<ffffffff81573350>] smp_call_function_single+0x200/0x310 kernel/smp.c:300
>> [<ffffffff816c9bd3>] task_function_call+0x123/0x160 kernel/events/core.c:101
>> [<ffffffff816d1bf1>] perf_install_in_context+0x201/0x340 kernel/events/core.c:2164
>> [<ffffffff816f62e5>] SYSC_perf_event_open+0x1465/0x21a0 kernel/events/core.c:8546
>> [<ffffffff816ff449>] SyS_perf_event_open+0x39/0x50 kernel/events/core.c:8242
>> [<ffffffff86a8a3b6>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185
>
>
> Now I managed to produce a related WARN:
>
> [ 2338.884942] ------------[ cut here ]------------
> [ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80()
> [ 2338.900504] Modules linked in:
> [ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty #244
> [ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [ 2338.923071] ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000
> [ 2338.931382] ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400
> [ 2338.939678] 0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00
> [ 2338.947987] Call Trace:
> [ 2338.950726] [<ffffffff815c680c>] dump_stack+0x4e/0x82
> [ 2338.956474] [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0
> [ 2338.963195] [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20
> [ 2338.969720] [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80
> [ 2338.976244] [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180
> [ 2338.982575] [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0
> [ 2338.988810] [<ffffffff8126de83>] load_elf_binary+0x393/0x1660
> [ 2338.995339] [<ffffffff811dc772>] ? get_user_pages+0x52/0x60
> [ 2339.001669] [<ffffffff8121e297>] search_binary_handler+0x97/0x200
> [ 2339.008581] [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0
> [ 2339.016072] [<ffffffff8121fcea>] SyS_execve+0x3a/0x50
> [ 2339.021819] [<ffffffff819fc165>] stub_execve+0x5/0x5
> [ 2339.027469] [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71
> [ 2339.034860] ---[ end trace ee1337c59a0ddeac ]---
>
>
> Which is an existing WARN_ON_ONCE() indicating that cpuctx->task_ctx is
> not what we expected it to be.
>
> The below patch appears (its so far managed to run longer than previous
> attempts, but I'm sure it'll explode the moment I've send this email) to
> have cured it.
>
> I'm not sure I can explain your problem with this, but I figure its
> worth a try.
>
>
> ---
> kernel/events/core.c | 15 +++++----------
> 1 file changed, 5 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index c3d61b92d805..d5293325d8c5 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3136,15 +3136,16 @@ static int event_enable_on_exec(struct perf_event *event,
> * Enable all of a task's events that have been marked enable-on-exec.
> * This expects task == current.
> */
> -static void perf_event_enable_on_exec(struct perf_event_context *ctx)
> +static void perf_event_enable_on_exec(int ctxn)
> {
> - struct perf_event_context *clone_ctx = NULL;
> + struct perf_event_context *ctx, *clone_ctx = NULL;
> struct perf_event *event;
> unsigned long flags;
> int enabled = 0;
> int ret;
>
> local_irq_save(flags);
> + ctx = current->perf_event_ctxp[ctxn];
> if (!ctx || !ctx->nr_events)
> goto out;
>
> @@ -3187,17 +3188,11 @@ static void perf_event_enable_on_exec(struct perf_event_context *ctx)
>
> void perf_event_exec(void)
> {
> - struct perf_event_context *ctx;
> int ctxn;
>
> rcu_read_lock();
> - for_each_task_context_nr(ctxn) {
> - ctx = current->perf_event_ctxp[ctxn];
> - if (!ctx)
> - continue;
> -
> - perf_event_enable_on_exec(ctx);
> - }
> + for_each_task_context_nr(ctxn)
> + perf_event_enable_on_exec(ctxn);
> rcu_read_unlock();
> }


Hi Peter,

I've applied your patch. I did not see the crashes since than. But
I've started seeing lots of stalls/hangs, see the other email titled
"perf: stalls in perf_install_in_context/perf_remove_from_context". I
am able to reproduce the stalls on stock
6764e5ebd5c62236d082f9ae030674467d0b2779 without any patches on top,
so the stalls are not related to your patch.
I am not sure why I started seeing the stalls only now. It is either a
recent change in the tree, or maybe fuzzer figured out how to trigger
the stall and started doing it frequently, or maybe your fix for the
crashes allowed fuzzer to drill deeper into perf... or maybe I just
missed stall reports between numerous crash reports. Anyway, testing
is kind of inconclusive now. Please take a look at the stall issue.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/