Re: [RFC PATCH -tip ] tracing: make a snapshot feature availablefrom userspace.

From: Hiraku Toyooka
Date: Wed Jul 04 2012 - 06:50:24 EST


Hello, Steven,

I've sent below RFC patch, but still have no responses. This patch can
be applied to current tip tree.

If you have time, could you give any comment about this patch?

Best regards,
Hiraku Toyooka


(2012/06/05 21:06), Hiraku Toyooka wrote:
Hello,

This patch provides new debugfs interfaces for taking a snapshot
in ftrace.

Ftrace is available as a flight recorder. When a kernel panic
occurs, we can get trace data from the dumped kernel image.

But in case of some systems, non-critical errors (such as
application's errors) are difficult to handle because of following
reasons.

- When the system has to continue to run, it can't stop tracing.
- Even then, failure analysis is necessary to prevent the same
errors.

So, I propose making the ftrace's snapshot feature available from
userspace to solve above dilemma.
(See also
https://events.linuxfoundation.org/images/stories/pdf/lcjp2012_toyooka.pdf)

This enables to preserve trace buffer at a particular point in time
without stopping tracing. When a snapshot is taken, ftrace swaps the
current buffer with a spare buffer which is prepared in advance. This
means that the tracing itself continues on the spare buffer.

Currently, ftrace has a snapshot feature available from kernel space
and some tracers (e.g. irqsoff) are using it. This patch allows users
to use the same snapshot feature via debugfs.

I added two debugfs files in "tracing" directory.

snapshot_enabled:

This is used to set or display whether the snapshot is
enabled. Echo 1 into this file to prepare a spare buffer
or 0 to shrink it. So, the memory for the spare buffer
will be consumed only when this knob is set.

snapshot_pipe:

This is used to take a snapshot and to read the output
of the snapshot (max_tr). Echo 1 into this file to take a
snapshot. Reads from this file is the same as the
"trace_pipe" file.

Here is an example of using the snapshot feature.

# echo nop > current_tracer
# echo 1 > snapshot_enabled
# echo 1 > events/sched/enable
[...]
# echo 1 > snapshot_pipe
# cat snapshot_pipe
bash-3352 [001] dN.. 18440.883932: sched_wakeup: comm=migration/6 p
id=28 prio=0 success=1 target_cpu=006
bash-3352 [001] dN.. 18440.883933: sched_wakeup: comm=migration/7 p
id=32 prio=0 success=1 target_cpu=007
bash-3352 [001] d... 18440.883935: sched_switch: prev_comm=bash pre
v_pid=3352 prev_prio=120 prev_state=R ==> next_comm=migration/1 next_pid=8 next_
prio=0
[...]


Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@xxxxxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Frederic Weisbecker <fweisbec@xxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Rob Landley <rob@xxxxxxxxxxx>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@xxxxxxxxxxx>
Cc: linux-doc@xxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
---

Documentation/trace/ftrace.txt | 47 +++++++++++
kernel/trace/trace.c | 152 +++++++++++++++++++++++++++++++++++++
kernel/trace/trace.h | 8 ++
kernel/trace/trace_irqsoff.c | 3 +
kernel/trace/trace_sched_wakeup.c | 2 5 files changed, 210 insertions(+), 2 deletions(-)

diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 6f51fed..df9fa13 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1842,6 +1842,53 @@ an error.
# cat buffer_size_kb
85
+
+Snapshot
+--------
+If CONFIG_TRACER_MAX_TRACE is set, the (generic) snapshot
+feature is available in all tracers except for the special
+tracers which use a snapshot inside themselves(such as "irqsoff"
+or "wakeup").
+
+This enables to preserve trace buffer at a particular point in
+time without stopping tracing. When a snapshot is taken, ftrace
+swaps the current buffer with a spare buffer which is prepared
+in advance. This means that the tracing itself continues on the
+spare buffer.
+
+Following debugfs files in "tracing" directory are related with
+this feature.
+
+ snapshot_enabled:
+
+ This is used to set or display whether the snapshot is
+ enabled. Echo 1 into this file to prepare a spare buffer
+ or 0 to shrink it. So, the memory for the spare buffer
+ will be consumed only when this knob is set.
+
+ snapshot_pipe:
+
+ This is used to take a snapshot and to read the output
+ of the snapshot. Echo 1 into this file to take a
+ snapshot. Reads from this file is the same as the
+ "trace_pipe" file (described above "The File System"
+ section), so that both reads from the snapshot and
+ tracing are executable in parallel.
+
+Here is an example of using the snapshot feature.
+
+ # echo nop > current_tracer
+ # echo 1 > snapshot_enabled
+ # echo 1 > events/sched/enable
+ [...]
+ # echo 1 > snapshot_pipe
+ # cat snapshot_pipe
+ bash-3352 [001] dN.. 18440.883932: sched_wakeup: comm=migration/6 pid=28 prio=0 success=1 target_cpu=006
+ bash-3352 [001] dN.. 18440.883933: sched_wakeup: comm=migration/7 pid=32 prio=0 success=1 target_cpu=007
+ bash-3352 [001] d... 18440.883935: sched_switch: prev_comm=bash prev_pid=3352 prev_prio=120 prev_state=R ==> next_comm=migration/1 next_pid=8 next_prio=0
+[...]
+
+
-----------
More details can be found in the source code, in the
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 68032c6..8e722c3 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -3266,7 +3266,8 @@ tracing_max_lat_write(struct file *filp, const char __user *ubuf,
return cnt;
}
-static int tracing_open_pipe(struct inode *inode, struct file *filp)
+static int __tracing_open_pipe(struct inode *inode, struct file *filp,
+ struct trace_array *tr)
{
long cpu_file = (long) inode->i_private;
struct trace_iterator *iter;
@@ -3308,7 +3309,7 @@ static int tracing_open_pipe(struct inode *inode, struct file *filp)
iter->iter_flags |= TRACE_FILE_LAT_FMT;
iter->cpu_file = cpu_file;
- iter->tr = &global_trace;
+ iter->tr = tr;
mutex_init(&iter->mutex);
filp->private_data = iter;
@@ -3327,6 +3328,11 @@ fail:
return ret;
}
+static int tracing_open_pipe(struct inode *inode, struct file *filp)
+{
+ return __tracing_open_pipe(inode, filp, &global_trace);
+}
+
static int tracing_release_pipe(struct inode *inode, struct file *file)
{
struct trace_iterator *iter = file->private_data;
@@ -4008,6 +4014,122 @@ static int tracing_clock_open(struct inode *inode, struct file *file)
return single_open(file, tracing_clock_show, NULL);
}
+#ifdef CONFIG_TRACER_MAX_TRACE
+static int tracing_open_snapshot_pipe(struct inode *inode, struct file *filp)
+{
+ return __tracing_open_pipe(inode, filp, &max_tr);
+}
+
+static ssize_t
+tracing_write_snapshot_pipe(struct file *filp, const char __user *ubuf,
+ size_t cnt, loff_t *ppos)
+{
+ unsigned long val = 0;
+ int ret;
+
+ ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
+ if (ret)
+ return ret;
+
+ mutex_lock(&trace_types_lock);
+
+ /* If current tracer's use_max_tr == 0, we prevent taking a snapshot */
+ if (!current_trace->use_max_tr) {
+ ret = -EBUSY;
+ goto out;
+ }
+ if (val) {
+ unsigned long flags;
+ local_irq_save(flags);
+ update_max_tr(&global_trace, current, raw_smp_processor_id());
+ local_irq_restore(flags);
+ }
+out:
+ mutex_unlock(&trace_types_lock);
+
+ if (!ret) {
+ *ppos += cnt;
+ ret = cnt;
+ }
+
+ return ret;
+}
+
+static ssize_t
+tracing_snapshot_ctrl_read(struct file *filp, char __user *ubuf,
+ size_t cnt, loff_t *ppos)
+{
+ char buf[64];
+ int r;
+
+ r = sprintf(buf, "%d\n", current_trace->use_max_tr);
+ return simple_read_from_buffer(ubuf, cnt, ppos, buf, r);
+}
+
+static ssize_t
+tracing_snapshot_ctrl_write(struct file *filp, const char __user *ubuf,
+ size_t cnt, loff_t *ppos)
+{
+ unsigned long val;
+ int ret;
+
+ ret = kstrtoul_from_user(ubuf, cnt, 10, &val);
+ if (ret)
+ return ret;
+
+ val = !!val;
+
+ mutex_lock(&trace_types_lock);
+ tracing_stop();
+ arch_spin_lock(&ftrace_max_lock);
+
+ /* When always_use_max_tr == 1, we can't toggle use_max_tr. */
+ if (current_trace->always_use_max_tr) {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ if (!(current_trace->use_max_tr ^ val))
+ goto out;
+
+ if (val) {
+ int cpu;
+ for_each_tracing_cpu(cpu) {
+ ret = ring_buffer_resize(max_tr.buffer,
+ global_trace.data[cpu]->entries,
+ cpu);
+ if (ret < 0)
+ break;
+ max_tr.data[cpu]->entries =
+ global_trace.data[cpu]->entries;
+ }
+ if (ret < 0) {
+ ring_buffer_resize(max_tr.buffer, 1,
+ RING_BUFFER_ALL_CPUS);
+ set_buffer_entries(&max_tr, 1);
+ ret = -ENOMEM;
+ goto out;
+ }
+ current_trace->use_max_tr = 1;
+ } else {
+ /* shrink buffer for saving memory */
+ ring_buffer_resize(max_tr.buffer, 1, RING_BUFFER_ALL_CPUS);
+ set_buffer_entries(&max_tr, 1);
+ current_trace->use_max_tr = 0;
+ }
+out:
+ arch_spin_unlock(&ftrace_max_lock);
+ tracing_start();
+ mutex_unlock(&trace_types_lock);
+
+ if (!ret) {
+ *ppos += cnt;
+ ret = cnt;
+ }
+ return ret;
+}
+#endif /* CONFIG_TRACER_MAX_TRACE */
+
static const struct file_operations tracing_max_lat_fops = {
.open = tracing_open_generic,
.read = tracing_max_lat_read,
@@ -4071,6 +4193,25 @@ static const struct file_operations trace_clock_fops = {
.write = tracing_clock_write,
};
+#ifdef CONFIG_TRACER_MAX_TRACE
+static const struct file_operations tracing_snapshot_pipe_fops = {
+ .open = tracing_open_snapshot_pipe,
+ .poll = tracing_poll_pipe,
+ .read = tracing_read_pipe,
+ .write = tracing_write_snapshot_pipe,
+ .splice_read = tracing_splice_read_pipe,
+ .release = tracing_release_pipe,
+ .llseek = no_llseek,
+};
+
+static const struct file_operations tracing_snapshot_ctrl_fops = {
+ .open = tracing_open_generic,
+ .read = tracing_snapshot_ctrl_read,
+ .write = tracing_snapshot_ctrl_write,
+ .llseek = generic_file_llseek,
+};
+#endif /* CONFIG_TRACER_MAX_TRACE */
+
struct ftrace_buffer_info {
struct trace_array *tr;
void *spare;
@@ -4819,6 +4960,13 @@ static __init int tracer_init_debugfs(void)
#ifdef CONFIG_TRACER_MAX_TRACE
trace_create_file("tracing_max_latency", 0644, d_tracer,
&tracing_max_latency, &tracing_max_lat_fops);
+
+ trace_create_file("snapshot_pipe", 0644, d_tracer,
+ (void *) TRACE_PIPE_ALL_CPU,
+ &tracing_snapshot_pipe_fops);
+
+ trace_create_file("snapshot_enabled", 0644, d_tracer,
+ NULL, &tracing_snapshot_ctrl_fops);
#endif
trace_create_file("tracing_thresh", 0644, d_tracer,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 5aec220..602657a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -254,6 +254,8 @@ struct tracer_flags {
* @print_line: callback that prints a trace
* @set_flag: signals one of your private flags changed (trace_options file)
* @flags: your private flags
+ * @use_max_tr: show whether the snapshot feature is available on this tracer
+ * @always_use_max_tr: show that max_tr is always used on this tracer
*/
struct tracer {
const char *name;
@@ -286,7 +288,13 @@ struct tracer {
struct tracer *next;
struct tracer_flags *flags;
int print_max;
+ /* Dynamically toggled via "snapshot_enabled" debugfs file */
int use_max_tr;
+ /*
+ * If this value is 1, this tracer always uses max_tr and "use_max_tr"
+ * can't be toggled.
+ */
+ int always_use_max_tr;
};
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 99d20e9..37cdb75 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -614,6 +614,7 @@ static struct tracer irqsoff_tracer __read_mostly =
.open = irqsoff_trace_open,
.close = irqsoff_trace_close,
.use_max_tr = 1,
+ .always_use_max_tr = 1,
};
# define register_irqsoff(trace) register_tracer(&trace)
#else
@@ -647,6 +648,7 @@ static struct tracer preemptoff_tracer __read_mostly =
.open = irqsoff_trace_open,
.close = irqsoff_trace_close,
.use_max_tr = 1,
+ .always_use_max_tr = 1,
};
# define register_preemptoff(trace) register_tracer(&trace)
#else
@@ -682,6 +684,7 @@ static struct tracer preemptirqsoff_tracer __read_mostly =
.open = irqsoff_trace_open,
.close = irqsoff_trace_close,
.use_max_tr = 1,
+ .always_use_max_tr = 1,
};
# define register_preemptirqsoff(trace) register_tracer(&trace)
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index ff791ea..b6e91a0 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -599,6 +599,7 @@ static struct tracer wakeup_tracer __read_mostly =
.open = wakeup_trace_open,
.close = wakeup_trace_close,
.use_max_tr = 1,
+ .always_use_max_tr = 1,
};
static struct tracer wakeup_rt_tracer __read_mostly =
@@ -620,6 +621,7 @@ static struct tracer wakeup_rt_tracer __read_mostly =
.open = wakeup_trace_open,
.close = wakeup_trace_close,
.use_max_tr = 1,
+ .always_use_max_tr = 1,
};
__init static int init_wakeup_tracer(void)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/