Re: sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine)
From: Steven Rostedt
Date: Fri Jul 23 2021 - 08:41:33 EST
On Fri, 23 Jul 2021 13:53:41 +0200
Stefan Metzmacher <metze@xxxxxxxxx> wrote:
> Hi Steve,
>
> >>> Assuming this does fix your issue, I sent out a real patch with the
> >>> explanation of what happened in the change log, so that you can see why
> >>> that change was your issue.
> >>
> >> Yes, it does the trick, thanks very much!
> >
> > Can I get a "Tested-by" from you?
>
> Sure!
Thanks.
>
> Can you check if the static_key_disable() and static_key_enable()
> calls are correctly ordered compared to rcu_assign_pointer()
> and explain why they are, as I not really understand that it's different
> from tracepoint_update_call vs. rcu_assign_pointer
The order doesn't even matter. I'm assuming you are talking about the
static_key_disable/enable with respect to enabling the tracepoint.
The reason it doesn't matter is because the rcu_assign_pointer is
updating an array of elements that hold both the function to call and
the data it needs. Inside the tracepoint loop where the callbacks are
called, in the iterator code (not the static call), you will see:
it_func_ptr = \
rcu_dereference_raw((&__tracepoint_##_name)->funcs); \
if (it_func_ptr) { \
do { \
it_func = READ_ONCE((it_func_ptr)->func); \
__data = (it_func_ptr)->data; \
((void(*)(void *, proto))(it_func))(__data, args); \
} while ((++it_func_ptr)->func); \
}
What that does is to get either the old array, or the new array and
places that array into it_func_ptr. Each element of this array contains
the callback (in .func) and the callback's data (in .data).
The enabling or disabling of the tracepoint doesn't really make a
difference with respect to the order of updating the funcs array.
That's because the users of this will either see the old array, the new
array, or NULL, in that it_func_ptr. This is how RCU works.
The bug we hit was because the static_call was updated separately from
the array. That makes it more imperative that you get the order
correct. As my email stated, with static_calls we have this:
it_func_ptr = \
rcu_dereference_raw((&__tracepoint_##name)->funcs); \
if (it_func_ptr) { \
__data = (it_func_ptr)->data; \
static_call(tp_func_##name)(__data, args); \
}
Where the issue is that the static_call which chooses which callback to
make, is updated asynchronously from the update of the array.
>
> >> Now I can finally use:
> >>
> >> trace-cmd record -e all -P $(pidof io_uring-cp-forever)
> >>
> >> But that doesn't include the iou-wrk-* threads
> >> and the '-c' option seems to only work with forking.
> >
> > Have you tried it? It should work for threads as well. It hooks to the
> > sched_process_fork tracepoint, which should be triggered even when a new
> > thread is created.
> >
> > Or do you mean that you want that process and all its threads too that are
> > already running? I could probably have it try to add it via the /proc file
> > system in that case.
> >
> > Can you start the task via trace-cmd?
> >
> > trace-cmd record -e all -F -c io_uring-cp-forever ...
>
> I think that would work, but I typically want to analyze a process
> that is already running.
>
> >> Is there a way to specify "trace *all* threads of the given pid"?
> >> (Note the threads are comming and going, so it's not possible to
> >> specifiy -P more than once)
> >
> > Right, although, you could append tasks manually to the set_event_pid file
> > from another terminal after starting trace-cmd. Once a pid is added to that
> > file, all children it makes will also be added. That could be a work around
> > until we have trace-cmd do it.
>
> Sure, but it will always be racy.
Not really. Matters what you mean by "racy". You wont be able to
"instantaneously" enable a process and all its threads, but you can
capture all of them after a given point. As you are attaching to a
process already running, you already missed events because you were not
yet tracing. But once you have a thread, you will always have it.
>
> With children, does new threads also count as children or only fork() children?
New threads. It's the kernel point of view of a task, which does not
differentiate processes from threads. Everything that can be scheduled
on a CPU is called a "task". How a task interacts with other tasks is
what defines it being a "thread" or a "process".
>
> > Care to write a bugzilla report for this feature?
> >
> > https://bugzilla.kernel.org/buglist.cgi?component=Trace-cmd%2FKernelshark&list_id=1088173
>
> I may do later...
>
Thanks,
-- Steve