Re: [PATCH v2] tracing: Add trace_trigger kernel command line option

From: Ross Zwisler
Date: Thu Dec 08 2022 - 19:19:32 EST


On Thu, Dec 08, 2022 at 06:39:45PM -0500, Steven Rostedt wrote:
> On Thu, 8 Dec 2022 15:27:07 -0700
> Ross Zwisler <zwisler@xxxxxxxxxx> wrote:
>
> > > +#ifdef CONFIG_HIST_TRIGGERS
> >
> > Can you help me understand why this is only available if
> > CONFIG_HIST_TRIGGERS is selected in the kernel config? AFAICT this
> > code doesn't depend on the histogram code, and the run-time selection
> > of triggers is usable without CONFIG_HIST_TRIGGERS.
>
> Good catch!
>
> I got confused, and only saw that as "CONFIG_TRIGGERS" and wasn't thinking
> that config was just for histogram triggers :-p
>
> Care to send a patch to fix it?

Sure, happy to.

One more question: I was playing with this code using examples from

https://www.kernel.org/doc/html/latest/trace/events.html

and when I tried to create a command line trigger to gather a snapshot:

trace_trigger="sched_switch.snapshot:1 if prev_state == 2"

it hits an oops:

[ 0.178179] traps: PANIC: double fault, error_code: 0xffffffffa0a02040
[ 0.178187] BUG: unable to handle page fault for address: fffffe000000b000
[ 0.178188] #PF: supervisor read access in kernel mode
[ 0.178189] #PF: error_code(0x0000) - not-present page
[ 0.178191] PGD 23ffc3067 P4D 23ffc3067 PUD 23ffc1067 PMD 23ffc0067 PTE 0
[ 0.178193] Oops: 0000 [#1] PREEMPT SMP PTI
[ 0.178195] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc7+ #4
[ 0.178197] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 0.178198] RIP: 0010:__die_header+0x1e/0x7d
[ 0.178204] Code: 00 00 31 c0 89 43 50 e9 30 ff 49 ff 0f 1f 44 00 00 8b 05 bf 3c 64 01 49 89 fa 85 c0 75 0e 48 c7 c7 60 f0 bc a1 b9 2a 00 01
[ 0.178206] RSP: 0000:fffffe000000aec0 EFLAGS: 00010046
[ 0.178207] RAX: 0000000000000000 RBX: ffffffffa0f5aaa8 RCX: 0000000000000002
[ 0.178208] RDX: ffffffffa0a02040 RSI: fffffe000000b000 RDI: ffffffffa1bcf100
[ 0.178209] RBP: fffffe000000af60 R08: ffffffffa125b280 R09: 6666666666666666
[ 0.178209] R10: ffffffffa0f5aaa8 R11: 203a65646f635f72 R12: ffffffffa0a02040
[ 0.178210] R13: ffffffffa0a02040 R14: 0000000000000000 R15: 0000000000000000
[ 0.178212] FS: 0000000000000000(0000) GS:ffff93ef77c00000(0000) knlGS:0000000000000000
[ 0.178213] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.178214] CR2: fffffe000000b000 CR3: 000000004500a000 CR4: 00000000000006b0
[ 0.178217] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.178218] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.178219] Call Trace:
[ 0.178223] <#DF>
[ 0.178224] __die+0x17/0x29
[ 0.178227] die+0x2a/0x50
[ 0.178232] ? zen_untrain_ret+0x1/0x1
[ 0.178237] exc_double_fault+0x173/0x180
[ 0.178242] asm_exc_double_fault+0x1f/0x30
[ 0.178248] WARNING: stack recursion on stack type 5
[ 0.178249] WARNING: can't access registers at asm_exc_double_fault+0x1f/0x30
[ 0.178253] </#DF>
[ 0.178253] <TASK>
[ 0.178254] ? _raw_spin_unlock_irq+0x11/0x40
[ 0.178256] ? wait_for_completion+0x7e/0x160
[ 0.178258] ? ring_buffer_resize+0x320/0x450
[ 0.178262] ? resize_buffer_duplicate_size+0x38/0xe0
[ 0.178264] ? tracing_alloc_snapshot_instance+0x23/0x40
[ 0.178266] ? register_snapshot_trigger+0x16/0x40
[ 0.178269] ? event_trigger_parse+0x113/0x160
[ 0.178272] ? trigger_process_regex+0xb8/0x100
[ 0.178274] ? __trace_early_add_events+0xb8/0x140
[ 0.178275] ? trace_event_init+0xcc/0x2dd
[ 0.178278] ? start_kernel+0x4a9/0x713
[ 0.178281] ? secondary_startup_64_no_verify+0xce/0xdb
[ 0.178285] </TASK>
[ 0.178285] Modules linked in:
[ 0.178286] CR2: fffffe000000b000
[ 0.220408] ---[ end trace 0000000000000000 ]---
[ 0.220409] RIP: 0010:__die_header+0x1e/0x7d
[ 0.220411] Code: 00 00 31 c0 89 43 50 e9 30 ff 49 ff 0f 1f 44 00 00 8b 05 bf 3c 64 01 49 89 fa 85 c0 75 0e 48 c7 c7 60 f0 bc a1 b9 2a 00 01
[ 0.220412] RSP: 0000:fffffe000000aec0 EFLAGS: 00010046
[ 0.220414] RAX: 0000000000000000 RBX: ffffffffa0f5aaa8 RCX: 0000000000000002
[ 0.220414] RDX: ffffffffa0a02040 RSI: fffffe000000b000 RDI: ffffffffa1bcf100
[ 0.220415] RBP: fffffe000000af60 R08: ffffffffa125b280 R09: 6666666666666666
[ 0.220416] R10: ffffffffa0f5aaa8 R11: 203a65646f635f72 R12: ffffffffa0a02040
[ 0.220416] R13: ffffffffa0a02040 R14: 0000000000000000 R15: 0000000000000000
[ 0.220417] FS: 0000000000000000(0000) GS:ffff93ef77c00000(0000) knlGS:0000000000000000
[ 0.220418] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.220419] CR2: fffffe000000b000 CR3: 000000004500a000 CR4: 00000000000006b0
[ 0.220421] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.220422] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 0.220423] Kernel panic - not syncing: Fatal exception in interrupt

I think that this is because the code to allocate the snapshot buffer uses
workqueues (ring_buffer_resize() calls schedule_work_on() then
wait_for_completion()), but at this point during the init process the
workqueues are up enough that we can enqueue entries, but they are not yet
doing work.

start_kernel() {
...

/*
* Allow workqueue creation and work item queueing/cancelling
* early. Work item execution depends on kthreads and starts after
* workqueue_init().
*/
workqueue_init_early();
...

/* Trace events are available after this */
trace_init(); // here is where we try and allocate the snapshot

...

arch_call_rest_init();
rest_init()
kernel_init()
kernel_init_freeable()
workqueue_init()
}

I'm guessing the best we can do here is just disallow snapshot triggers via
the command line option, so that others don't cut themselves on this sharp
corner? Other ideas?

Thanks,
- Ross