[RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)

From: Dmitry Safonov
Date: Thu Aug 25 2016 - 22:01:36 EST


This patches set is cleanly RFC and is not supposed to be applied.
Also for RFC time it builds only on x86_64.

So, in a mail thread Oleg told that it would be worth to introduce vm_file
for vdso mappings as currently uprobes can not be placed on vDSO VMAs [1].
In this patches set I introduce in-kernel filesystem for vdso files.
After patches vDSO VMA now has inode and is just a private file mapping:
7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0 [vvar]
7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18 [vdso]

Then I introduce interface in uprobe_events to insert uprobes in vdso.
FWIW:
[~]# cd kernel/linux
[linux]# readelf --syms arch/x86/entry/vdso/vdso64.so
Symbol table '.dynsym' contains 11 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000470 0 SECTION LOCAL DEFAULT 8
2: 00000000000008d0 885 FUNC WEAK DEFAULT 12 clock_gettime@@LINUX_2.6
3: 0000000000000c50 472 FUNC GLOBAL DEFAULT 12 __vdso_gettimeofday@@LINUX_2.6
4: 0000000000000c50 472 FUNC WEAK DEFAULT 12 gettimeofday@@LINUX_2.6
5: 0000000000000e30 21 FUNC GLOBAL DEFAULT 12 __vdso_time@@LINUX_2.6
6: 0000000000000e30 21 FUNC WEAK DEFAULT 12 time@@LINUX_2.6
7: 00000000000008d0 885 FUNC GLOBAL DEFAULT 12 __vdso_clock_gettime@@LINUX_2.6
8: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LINUX_2.6
9: 0000000000000e50 41 FUNC GLOBAL DEFAULT 12 __vdso_getcpu@@LINUX_2.6
10: 0000000000000e50 41 FUNC WEAK DEFAULT 12 getcpu@@LINUX_2.6
[~]# cd /sys/kernel/debug/tracing/
[tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events
[tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events
[tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events
[tracing]# echo 1 > events/uprobes/enable
[tracing]# su test # it has UID=1001
[tracing]$ date
Thu Aug 25 17:19:29 MSK 2016
[tracing]$ exit
[tracing]# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 175/175 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
bash-11560 [001] d... 316.470236: time: (0x7ffcacebae30)
bash-11560 [001] d... 316.471436: gettimeofday: (0x7ffcacebac50)
bash-11560 [001] d... 316.477550: time: (0x7ffcacebae30)
bash-11560 [001] d... 316.477655: time: (0x7ffcacebae30)
mktemp-11568 [001] d... 316.479589: gettimeofday: (0x7ffc603f0c50)
date-11571 [001] d... 316.481890: clock_gettime: (0x7ffec9db58d0)
[...]

If this approach will be decided as fine, I will prepare a better version,
fixing the following things:
o put vdsofs in generic fs/* dir
o support other archs and vdso blobs
o remove BUG_ON()'s and UID==1001 check
o remove extern's and use headers only
o refactor code in create_trace_uprobe()
o add some state to (struct trace_uprobe), so i.e., `cat uprobe_events` will
print those uprobes as vdso-based
o document this interface in Documentation/trace/uprobetracer.txt
o prepare nice patches set?

So, opinions? Is it worth to add something like this?

[1]: https://lkml.org/lkml/2016/7/12/346

Dmitry Safonov (3):
x86/vdso: create vdso file, use it for mapping
uprobe: drop isdigit() check in create_trace_uprobe
uprobe: add vdso support

Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: x86@xxxxxxxxxx
Cc: Dmitry Safonov <0x7f454c46@xxxxxxxxx>

arch/x86/entry/vdso/vma.c | 148 ++++++++++++++++++++++++++++++++++++++++++--
kernel/trace/trace_uprobe.c | 50 +++++++++++----
2 files changed, 180 insertions(+), 18 deletions(-)

--
2.9.0