Re: [PATCH] ftrace: Add missing check for existing hwlat thread

From: Thomas Gleixner
Date: Wed Aug 01 2018 - 15:59:45 EST


On Wed, 1 Aug 2018, Steven Rostedt wrote:

> On Wed, 1 Aug 2018 12:45:54 +0200
> Erica Bugden <erica.bugden@xxxxxxxxxxxxx> wrote:
>
> > The hwlat tracer uses a kernel thread to measure latencies. The function
> > that creates this kernel thread, start_kthread(), can be called when the
> > tracer is initialized and when the tracer is explicitly enabled.
> > start_kthread() does not check if there is an existing hwlat kernel
> > thread and will create a new one each time it is called.
> >
> > This causes the reference to the previous thread to be lost. Without the
> > thread reference, the old kernel thread becomes unstoppable and
> > continues to use CPU time even after the hwlat tracer has been disabled.
> > This problem can be observed when a system is booted with tracing
> > enabled and the hwlat tracer is configured like this:
> >
> > echo hwlat > current_tracer; echo 1 > tracing_on
> >
> > Add the missing check for an existing kernel thread in start_kthread()
> > to prevent this problem. This function and the rest of the hwlat kernel
> > thread setup and teardown are already serialized because they are called
> > through the tracer core code with trace_type_lock held.
> >
> > Signed-off-by: Erica Bugden <erica.bugden@xxxxxxxxxxxxx>
> > ---
> > kernel/trace/trace_hwlat.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
> > index d7c8e4e..2d9d36d 100644
> > --- a/kernel/trace/trace_hwlat.c
> > +++ b/kernel/trace/trace_hwlat.c
> > @@ -354,6 +354,9 @@ static int start_kthread(struct trace_array *tr)
> > struct task_struct *kthread;
> > int next_cpu;
> >
> > + if (hwlat_kthread)
> > + return 0;
> > +
>
> This looks like it is treating the symptom and not the disease.

My bad. We looked at the other instances of tracers and they all have
protection against this kind of call sequence ...


> > /* Just pick the first CPU on first iteration */
> > current_mask = &save_cpumask;
> > get_online_cpus();
>
> Can you try this patch?
>
> -- Steve
>
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 823687997b01..15862044db05 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -7628,7 +7628,9 @@ rb_simple_write(struct file *filp, const char __user *ubuf,
>
> if (buffer) {
> mutex_lock(&trace_types_lock);
> - if (val) {
> + if (!!val == tracer_tracing_is_on(tr)) {
> + val = 0; /* do nothing */
> + } else if (val) {
> tracer_tracing_on(tr);
> if (tr->current_trace->start)
> tr->current_trace->start(tr);
>