Re: [RFC] Re: [PATCH v3 1/2] tracing/syscalls: Rename variable 'nr' to '__syscall_nr'

From: Steven Rostedt
Date: Fri Feb 26 2016 - 13:23:12 EST


On Fri, 26 Feb 2016 10:57:13 -0300
Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:

> Em Fri, Feb 26, 2016 at 10:14:06PM +0900, Taeung Song escreveu:
> > There is a problem about duplicated variable name i.e.
> >
> > # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_io_getevents/format
> > name: sys_enter_io_getevents
> > ID: 739
> > format:
>
> Steven, what do you think?
>
> Should we break this ABI while disambiguating the 'nr' field, using
> '__syscall_nr' in an attempt to use a name that is unlikely to be used
> by a real syscall argument name?
>
> If we stand by published ABIs, we should keep it written in stone and
> state that the first 'nr' means '__syscall_nr' while keeping it as-is,
> the change for 'perf trace' in that case is to do nothing, it work
> as-is, we have just to fix the python binding to do that rename.

ABIs only matter if they break something, and people complain. Linus
has been somewhat accepting of us fixing those tools that break and we
push out the fixes. If an ABI breaks in the forest and nobody is around
to complain about it, did it really break?

I would say, lets make the change and fix perf. If people complain, we
send them the fixes for their tools. If they need the distros to have
the fixes, then let the change be reverted, and we wait till the
distros have the update (this may take a few years), then re-submit.

This worked for me to get rid of padding that was in every trace event.
The change was reverted, I fixed the tools that broke, waited till all
the major distros had the updates. And resubmitted the change. Linus
took it.


>
> Perhaps we can live with that, to avoid having three different cases:
> !nr, nr and __syscall_nr.

We could, do this as well. Want me to add something to event-parse?

>
> Ingo, Peter, have you guys followed this case?
>
> Summary: Some tracepoint have multiple fields with the same name, 'nr',
> the first one is a unique syscall ID, the other is a syscall
> argument:
>
> [root@jouet ~]# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_io_getevents/format
> name: sys_enter_io_getevents
> ID: 747
> format:
> field:unsigned short common_type; offset:0; size:2; signed:0;
> field:unsigned char common_flags; offset:2; size:1; signed:0;
> field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
> field:int common_pid; offset:4; size:4; signed:1;
>
> field:int nr; offset:8; size:4; signed:1;
> field:aio_context_t ctx_id; offset:16; size:8; signed:0;
> field:long min_nr; offset:24; size:8; signed:0;
> field:long nr; offset:32; size:8; signed:0;
> field:struct io_event * events; offset:40; size:8; signed:0;
> field:struct timespec * timeout; offset:48; size:8; signed:0;
>
> print fmt: "ctx_id: 0x%08lx, min_nr: 0x%08lx, nr: 0x%08lx, events: 0x%08lx, timeout: 0x%08lx", ((unsigned long)(REC->ctx_id)), ((unsigned long)(REC->min_nr)), ((unsigned long)(REC->nr)), ((unsigned long)(REC->events)), ((unsigned long)(REC->timeout))
> [root@jouet ~]#
>

BTW, here's a less intrusive change, because honestly, I hate the
kernel structure having underscores in the name.

This could be signed off by Taeung Song and myself.

-- Steve

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 0655afbea83f..d1663083d903 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -186,11 +186,11 @@ print_syscall_exit(struct trace_iterator *iter, int flags,

extern char *__bad_type_size(void);

-#define SYSCALL_FIELD(type, name) \
- sizeof(type) != sizeof(trace.name) ? \
+#define SYSCALL_FIELD(type, field, name) \
+ sizeof(type) != sizeof(trace.field) ? \
__bad_type_size() : \
- #type, #name, offsetof(typeof(trace), name), \
- sizeof(trace.name), is_signed_type(type)
+ #type, #name, offsetof(typeof(trace), field), \
+ sizeof(trace.field), is_signed_type(type)

static int __init
__set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
@@ -261,7 +261,8 @@ static int __init syscall_enter_define_fields(struct trace_event_call *call)
int i;
int offset = offsetof(typeof(trace), args);

- ret = trace_define_field(call, SYSCALL_FIELD(int, nr), FILTER_OTHER);
+ ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
+ FILTER_OTHER);
if (ret)
return ret;

@@ -281,11 +282,12 @@ static int __init syscall_exit_define_fields(struct trace_event_call *call)
struct syscall_trace_exit trace;
int ret;

- ret = trace_define_field(call, SYSCALL_FIELD(int, nr), FILTER_OTHER);
+ ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
+ FILTER_OTHER);
if (ret)
return ret;

- ret = trace_define_field(call, SYSCALL_FIELD(long, ret),
+ ret = trace_define_field(call, SYSCALL_FIELD(long, ret, ret),
FILTER_OTHER);

return ret;