Re: [PATCH] ftrace: Add missing check for existing hwlat thread

From: Erica Bugden
Date: Fri Aug 03 2018 - 04:46:38 EST


On Wed, 2018-08-01 at 15:40 -0400, Steven Rostedt wrote:
> On Wed,ÂÂ1 Aug 2018 12:45:54 +0200
> > Erica Bugden <erica.bugden@xxxxxxxxxxxxx> wrote:
>
> > The hwlat tracer uses a kernel thread to measure latencies. The function
> > that creates this kernel thread, start_kthread(), can be called when the
> > tracer is initialized and when the tracer is explicitly enabled.
> > start_kthread() does not check if there is an existing hwlat kernel
> > thread and will create a new one each time it is called.
> >
> > This causes the reference to the previous thread to be lost. Without the
> > thread reference, the old kernel thread becomes unstoppable and
> > continues to use CPU time even after the hwlat tracer has been disabled.
> > This problem can be observed when a system is booted with tracing
> > enabled and the hwlat tracer is configured like this:
> >
> > echo hwlat > current_tracer; echo 1 > tracing_on
> >
> > Add the missing check for an existing kernel thread in start_kthread()
> > to prevent this problem. This function and the rest of the hwlat kernel
> > thread setup and teardown are already serialized because they are called
> > through the tracer core code with trace_type_lock held.
> >
> > > > Signed-off-by: Erica Bugden <erica.bugden@xxxxxxxxxxxxx>
> > ---
> > Âkernel/trace/trace_hwlat.c | 3 +++
> > Â1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
> > index d7c8e4e..2d9d36d 100644
> > --- a/kernel/trace/trace_hwlat.c
> > +++ b/kernel/trace/trace_hwlat.c
> > @@ -354,6 +354,9 @@ static int start_kthread(struct trace_array *tr)
> > > > Â struct task_struct *kthread;
> > > > Â int next_cpu;
> > Â
> > > > + if (hwlat_kthread)
> > > > + return 0;
> > +
>
> This looks like it is treating the symptom and not the disease.
>
> > > > Â /* Just pick the first CPU on first iteration */
> > > > Â current_mask = &save_cpumask;
> > > > Â get_online_cpus();
>
> Can you try this patch?

I tested the patch below and it also fixes the problem.

>
> -- Steve
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 823687997b01..15862044db05 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -7628,7 +7628,9 @@ rb_simple_write(struct file *filp, const char __user *ubuf,
> Â
> Â if (buffer) {
> Â mutex_lock(&trace_types_lock);
> - if (val) {
> + if (!!val == tracer_tracing_is_on(tr)) {
> + val = 0; /* do nothing */
> + } else if (val) {
> Â tracer_tracing_on(tr);
> Â if (tr->current_trace->start)
> Â tr->current_trace->start(tr);