Re: [RFC] Re: [PATCH v3 1/2] tracing/syscalls: Rename variable 'nr' to '__syscall_nr'

From: Taeung Song
Date: Sat Feb 27 2016 - 11:10:56 EST


Hi, Steven

On 02/27/2016 03:23 AM, Steven Rostedt wrote:
On Fri, 26 Feb 2016 10:57:13 -0300
Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:

Em Fri, Feb 26, 2016 at 10:14:06PM +0900, Taeung Song escreveu:
There is a problem about duplicated variable name i.e.

# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_io_getevents/format
name: sys_enter_io_getevents
ID: 739
format:

Steven, what do you think?

Should we break this ABI while disambiguating the 'nr' field, using
'__syscall_nr' in an attempt to use a name that is unlikely to be used
by a real syscall argument name?

If we stand by published ABIs, we should keep it written in stone and
state that the first 'nr' means '__syscall_nr' while keeping it as-is,
the change for 'perf trace' in that case is to do nothing, it work
as-is, we have just to fix the python binding to do that rename.

ABIs only matter if they break something, and people complain. Linus
has been somewhat accepting of us fixing those tools that break and we
push out the fixes. If an ABI breaks in the forest and nobody is around
to complain about it, did it really break?

I would say, lets make the change and fix perf. If people complain, we
send them the fixes for their tools. If they need the distros to have
the fixes, then let the change be reverted, and we wait till the
distros have the update (this may take a few years), then re-submit.

This worked for me to get rid of padding that was in every trace event.
The change was reverted, I fixed the tools that broke, waited till all
the major distros had the updates. And resubmitted the change. Linus
took it.



Perhaps we can live with that, to avoid having three different cases:
!nr, nr and __syscall_nr.

We could, do this as well. Want me to add something to event-parse?


Ingo, Peter, have you guys followed this case?

Summary: Some tracepoint have multiple fields with the same name, 'nr',
the first one is a unique syscall ID, the other is a syscall
argument:

[root@jouet ~]# cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_io_getevents/format
name: sys_enter_io_getevents
ID: 747
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;

field:int nr; offset:8; size:4; signed:1;
field:aio_context_t ctx_id; offset:16; size:8; signed:0;
field:long min_nr; offset:24; size:8; signed:0;
field:long nr; offset:32; size:8; signed:0;
field:struct io_event * events; offset:40; size:8; signed:0;
field:struct timespec * timeout; offset:48; size:8; signed:0;

print fmt: "ctx_id: 0x%08lx, min_nr: 0x%08lx, nr: 0x%08lx, events: 0x%08lx, timeout: 0x%08lx", ((unsigned long)(REC->ctx_id)), ((unsigned long)(REC->min_nr)), ((unsigned long)(REC->nr)), ((unsigned long)(REC->events)), ((unsigned long)(REC->timeout))
[root@jouet ~]#


BTW, here's a less intrusive change, because honestly, I hate the
kernel structure having underscores in the name.

This could be signed off by Taeung Song and myself.

-- Steve

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 0655afbea83f..d1663083d903 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -186,11 +186,11 @@ print_syscall_exit(struct trace_iterator *iter, int flags,

extern char *__bad_type_size(void);

-#define SYSCALL_FIELD(type, name) \
- sizeof(type) != sizeof(trace.name) ? \
+#define SYSCALL_FIELD(type, field, name) \
+ sizeof(type) != sizeof(trace.field) ? \
__bad_type_size() : \
- #type, #name, offsetof(typeof(trace), name), \
- sizeof(trace.name), is_signed_type(type)
+ #type, #name, offsetof(typeof(trace), field), \
+ sizeof(trace.field), is_signed_type(type)

static int __init
__set_enter_print_fmt(struct syscall_metadata *entry, char *buf, int len)
@@ -261,7 +261,8 @@ static int __init syscall_enter_define_fields(struct trace_event_call *call)
int i;
int offset = offsetof(typeof(trace), args);

- ret = trace_define_field(call, SYSCALL_FIELD(int, nr), FILTER_OTHER);
+ ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
+ FILTER_OTHER);
if (ret)
return ret;

@@ -281,11 +282,12 @@ static int __init syscall_exit_define_fields(struct trace_event_call *call)
struct syscall_trace_exit trace;
int ret;

- ret = trace_define_field(call, SYSCALL_FIELD(int, nr), FILTER_OTHER);
+ ret = trace_define_field(call, SYSCALL_FIELD(int, nr, __syscall_nr),
+ FILTER_OTHER);
if (ret)
return ret;

- ret = trace_define_field(call, SYSCALL_FIELD(long, ret),
+ ret = trace_define_field(call, SYSCALL_FIELD(long, ret, ret),
FILTER_OTHER);

return ret;


Would you mean to avoid struct syscall_trace_enter or _exit
has '__syscall_nr' variable ?

So not including a portion of this patch([PATCH v3 1/2] tracing/syscalls: ...)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 8414fa4..98b3c66 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -88,13 +88,13 @@ enum trace_type {
*/
struct syscall_trace_enter {
struct trace_entry ent;
- int nr;
+ int __syscall_nr;
unsigned long args[];
};

struct syscall_trace_exit {
struct trace_entry ent;
- int nr;
+ int __syscall_nr;
long ret;
};


I got it :-)
In conclusion, output of format (/sys/kernel/debug/tracing/events/syscalls/*/format)
has __syscall_nr for syscall number but the kernel structure
'syscall_trace_enter' and 'syscall_trace_exit' have not __syscall_nr variable.
Is it right ?

I'll resend modified patch after testing new patch you suggest soon.

Thanks,
Taeung