Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)

From: H. Peter Anvin
Date: Thu Aug 25 2016 - 20:46:19 EST


On August 25, 2016 3:53:43 PM PDT, Dmitry Safonov <0x7f454c46@xxxxxxxxx> wrote:
>2016-08-25 23:49 GMT+03:00 H. Peter Anvin <hpa@xxxxxxxxx>:
>> On August 25, 2016 8:21:07 AM PDT, Dmitry Safonov
><dsafonov@xxxxxxxxxxxxx> wrote:
>>>This patches set is cleanly RFC and is not supposed to be applied.
>>>Also for RFC time it builds only on x86_64.
>>>
>>>So, in a mail thread Oleg told that it would be worth to introduce
>>>vm_file
>>>for vdso mappings as currently uprobes can not be placed on vDSO VMAs
>>>[1].
>>>In this patches set I introduce in-kernel filesystem for vdso files.
>>>After patches vDSO VMA now has inode and is just a private file
>>>mapping:
>>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>>> [vvar]
>>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>>> [vdso]
>>>
>>>Then I introduce interface in uprobe_events to insert uprobes in
>vdso.
>>>FWIW:
>>> [~]# cd kernel/linux
>>> [linux]# readelf --syms arch/x86/entry/vdso/vdso64.so
>>>Symbol table '.dynsym' contains 11 entries:
>>> Num: Value Size Type Bind Vis Ndx Name
>>> 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
>>> 1: 0000000000000470 0 SECTION LOCAL DEFAULT 8
>>>2: 00000000000008d0 885 FUNC WEAK DEFAULT 12
>>>clock_gettime@@LINUX_2.6
>>>3: 0000000000000c50 472 FUNC GLOBAL DEFAULT 12
>>>__vdso_gettimeofday@@LINUX_2.6
>>>4: 0000000000000c50 472 FUNC WEAK DEFAULT 12
>>>gettimeofday@@LINUX_2.6
>>>5: 0000000000000e30 21 FUNC GLOBAL DEFAULT 12
>>>__vdso_time@@LINUX_2.6
>>> 6: 0000000000000e30 21 FUNC WEAK DEFAULT 12
>time@@LINUX_2.6
>>>7: 00000000000008d0 885 FUNC GLOBAL DEFAULT 12
>>>__vdso_clock_gettime@@LINUX_2.6
>>> 8: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LINUX_2.6
>>>9: 0000000000000e50 41 FUNC GLOBAL DEFAULT 12
>>>__vdso_getcpu@@LINUX_2.6
>>>10: 0000000000000e50 41 FUNC WEAK DEFAULT 12
>>>getcpu@@LINUX_2.6
>>> [~]# cd /sys/kernel/debug/tracing/
>>> [tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events
>>> [tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events
>>> [tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events
>>> [tracing]# echo 1 > events/uprobes/enable
>>> [tracing]# su test # it has UID=1001
>>> [tracing]$ date
>>> Thu Aug 25 17:19:29 MSK 2016
>>> [tracing]$ exit
>>> [tracing]# cat trace
>>> # tracer: nop
>>> #
>>> # entries-in-buffer/entries-written: 175/175 #P:4
>>> #
>>> # _-----=> irqs-off
>>> # / _----=> need-resched
>>> # | / _---=> hardirq/softirq
>>> # || / _--=> preempt-depth
>>> # ||| / delay
>>> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
>>> # | | | |||| | |
>>> bash-11560 [001] d... 316.470236: time:
>(0x7ffcacebae30)
>>> bash-11560 [001] d... 316.471436: gettimeofday:
>(0x7ffcacebac50)
>>> bash-11560 [001] d... 316.477550: time:
>(0x7ffcacebae30)
>>> bash-11560 [001] d... 316.477655: time:
>(0x7ffcacebae30)
>>> mktemp-11568 [001] d... 316.479589: gettimeofday:
>(0x7ffc603f0c50)
>>> date-11571 [001] d... 316.481890: clock_gettime:
>(0x7ffec9db58d0)
>>>[...]
>>>
>>>If this approach will be decided as fine, I will prepare a better
>>>version,
>>>fixing the following things:
>>>o put vdsofs in generic fs/* dir
>>>o support other archs and vdso blobs
>>>o remove BUG_ON()'s and UID==1001 check
>>>o remove extern's and use headers only
>>>o refactor code in create_trace_uprobe()
>>>o add some state to (struct trace_uprobe), so i.e., `cat
>uprobe_events`
>>>will
>>> print those uprobes as vdso-based
>>>o document this interface in Documentation/trace/uprobetracer.txt
>>>o prepare nice patches set?
>>>
>>>So, opinions? Is it worth to add something like this?
>>>
>>>[1]: https://lkml.org/lkml/2016/7/12/346
>>>
>>>Dmitry Safonov (3):
>>> x86/vdso: create vdso file, use it for mapping
>>> uprobe: drop isdigit() check in create_trace_uprobe
>>> uprobe: add vdso support
>>>
>>>Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
>>>Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>>>Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
>>>Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx>
>>>Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>>>Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>>>Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
>>>Cc: x86@xxxxxxxxxx
>>>Cc: Dmitry Safonov <0x7f454c46@xxxxxxxxx>
>>>
>>>arch/x86/entry/vdso/vma.c | 148
>>>++++++++++++++++++++++++++++++++++++++++++--
>>> kernel/trace/trace_uprobe.c | 50 +++++++++++----
>>> 2 files changed, 180 insertions(+), 18 deletions(-)
>>
>> I think there is a lot to be said for this idea. However, a private
>mapping is definitely wrong for the vvar data; for the vdso code it
>could be considered either way I suppose.
>
>Thanks on your reply.
>As you could see, I preserved pure mapping of pfn for vvar:
>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
> [vvar]
>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
> [vdso]
>(no inode number).
>I also think it would be useless to do the same to vvar as it
>has just data and there is no point in probing it.

Well, it would things like mremap() just work and so on. Let's get rid of special cases if we are.
--
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.