[PATCH 15/15] tracing: Add 'hist' trigger Documentation

From: Tom Zanussi
Date: Mon Mar 02 2015 - 11:02:31 EST


Add documentation and usage examples for 'hist' triggers.

Signed-off-by: Tom Zanussi <tom.zanussi@xxxxxxxxxxxxxxx>
---
Documentation/trace/events.txt | 870 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 870 insertions(+)

diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
index 75d25a1..0e7a27f 100644
--- a/Documentation/trace/events.txt
+++ b/Documentation/trace/events.txt
@@ -494,3 +494,873 @@ The following commands are supported:

Note that there can be only one traceon or traceoff trigger per
triggering event.
+
+- hist
+
+ This command aggregates event hits into a hash table keyed on a
+ trace event format fields (or stacktrace) and a set of running
+ totals derived from one or more trace event format fields and/or
+ event counts (hitcount).
+
+ The format of a hist trigger is as follows:
+
+ hist:keys=<field1>:values=<field1[,field2,...]>
+ [:size=#entries][:sort=field1][:pause][:continue]
+ [:clear] [if <filter>]
+
+ When a matching event is hit, an entry is added to a hash table
+ using the key(s) and value(s) named. Keys and values correspond to
+ fields in the event's format description. Values must correspond to
+ numeric fields - on an event hit, the value(s) will be added to a
+ sum kept for that field. The special string 'hitcount' can be used
+ in place of an explicit value field - this is simply a count of
+ event hits. Keys can be any field, or the special string
+ 'stacktrace', which will use the event's kernel stacktrace as the
+ key. The keywords 'keys' or 'key' can be used to specify keys, and
+ the keyworks 'values', 'vals', or 'val' can be used to specify
+ values. For the time being, only a single key can be used -
+ compound keys aren't yet supported.
+
+ 'hist' triggers add a 'hist' file to each event's subdirectory.
+ Reading the 'hist' file for the event will dump the hash table in
+ its entirety to stdout. By default, numeric fields are displayed as
+ base-10 integers. This can be modified by appending any of the
+ following modifiers to the field name:
+
+ .hex display a number as a hex value
+ .sym display an address as a symbol
+ .syscall display a syscall id as a system call name
+ .execname display a common_pid as a program name
+
+ A typical usage scenario would be the following to enable a hist
+ trigger, read its current contents, and then turn it off:
+
+ # echo 'hist:keys=skbaddr.hex:vals=len' > \
+ /sys/kernel/debug/tracing/events/net/netif_rx/trigger
+
+ # cat /sys/kernel/debug/tracing/events/net/netif_rx/hist
+
+ # echo '!hist:keys=skbaddr.hex:vals=len' > \
+ /sys/kernel/debug/tracing/events/net/netif_rx/trigger
+
+ The trigger file itself can be read to show the details of the
+ currently attached hist trigger. This information is also displayed
+ at the top of the 'hist' file when read.
+
+ By default, the size of the hash table is 2048 entries. The 'size'
+ param can be used to specify more or fewer than that. The units are
+ in terms of hashtable entries - if a run uses more entries than
+ specified, the results will show the number of 'drops', the number
+ of hits that were ignored. The size should be a power of 2 between
+ 128 and 131072 (any non- power-of-2 number specified will be rounded
+ up).
+
+ The 'sort' param can be used to specify a value field to sort on.
+ The default if unspecified is 'hitcount' and the default sort order
+ is 'ascending'. To sort in the opposite direction, append
+ .descending' to the sort key.
+
+ The 'pause' param can be used to pause an existing hist trigger or
+ to start a hist trigger but not log any events until told to do so.
+ 'continue' or 'cont' can be used to start or restart a paused hist
+ trigger.
+
+ The 'clear' param will clear the contents of a running hist trigger
+ and leave its current paused/active state.
+
+- enable_hist/disable_hist
+
+ The enable_hist and disable_hist triggers can be used to have one
+ event conditionally start and stop another event's already-attached
+ hist trigger. Any number of enable_hist and disable_hist triggers
+ can be attached to a given event, allowing that event to kick off
+ and stop aggregations on a host of other events.
+
+ The format is very similar to the enable/disable_event triggers:
+
+ enable_hist:<system>:<event>[:count]
+ disable_hist:<system>:<event>[:count]
+
+ Instead of enabling or disabling the tracing of the target event
+ into the trace buffer as the enable/disable_event triggers do, the
+ enable/disable_hist triggers enable or disable the aggregation of
+ the target event into a hash table.
+
+ A typical usage scenario for the enable_hist/disable_hist triggers
+ would be to first set up a paused hist trigger on some event,
+ followed by an enable_hist/disable_hist pair that turns the hist
+ aggregation on and off when conditions of interest are hit:
+
+ # echo 'hist:keys=skbaddr.hex:vals=len:pause' > \
+ /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
+
+ # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
+
+ # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
+
+ The above sets up an initially paused hist trigger which is unpaused
+ and starts aggregating events when a given program is executed, and
+ which stops aggregating when the process exits and the hist trigger
+ is paused again.
+
+ The examples below provide a more concrete illustration of the
+ concepts and typical usage patterns discussed above.
+
+
+6.2 'hist' trigger examples
+---------------------------
+
+ The first set of examples creates aggregations using the kmalloc
+ event. The fields that can be used for the hist trigger are listed
+ in the kmalloc event's format file:
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/format
+ name: kmalloc
+ ID: 374
+ format:
+ field:unsigned short common_type; offset:0; size:2; signed:0;
+ field:unsigned char common_flags; offset:2; size:1; signed:0;
+ field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
+ field:int common_pid; offset:4; size:4; signed:1;
+
+ field:unsigned long call_site; offset:8; size:8; signed:0;
+ field:const void * ptr; offset:16; size:8; signed:0;
+ field:size_t bytes_req; offset:24; size:8; signed:0;
+ field:size_t bytes_alloc; offset:32; size:8; signed:0;
+ field:gfp_t gfp_flags; offset:40; size:4; signed:0;
+
+ We'll start by creating a hist trigger that generates a simple table
+ that lists the total number of bytes requested for each function in
+ the kernel that made one or more calls to kmalloc:
+
+ # echo 'hist:key=call_site:val=bytes_req' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ This tells the tracing system to create a 'hist' trigger using the
+ call_site field of the kmalloc event as the key for the table, which
+ just means that each unique call_site address will have an entry
+ created for it in the table. The 'val=bytes_req' parameter tells
+ the hist trigger that for each unique entry (call_site) in the
+ table, it should keep a running total of the number of bytes
+ requested by that call_site.
+
+ We'll let it run for awhile and then dump the contents of the 'hist'
+ file in the kmalloc event's subdirectory (for readability, a number
+ of entries have been omitted):
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
+ trigger info: hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active]
+
+ call_site: 18446744071581750326 hitcount: 1 bytes_req: 24
+ call_site: 18446744071583151255 hitcount: 1 bytes_req: 32
+ call_site: 18446744071582443167 hitcount: 1 bytes_req: 264
+ call_site: 18446744072104099935 hitcount: 2 bytes_req: 464
+ call_site: 18446744071579323550 hitcount: 3 bytes_req: 168
+ call_site: 18446744071580558850 hitcount: 4 bytes_req: 65536
+ .
+ .
+ .
+ call_site: 18446744072101362738 hitcount: 4115 bytes_req: 362120
+ call_site: 18446744072103235682 hitcount: 4115 bytes_req: 427960
+ call_site: 18446744072102962280 hitcount: 4342 bytes_req: 1637584
+ call_site: 18446744072102962328 hitcount: 4973 bytes_req: 360624
+ call_site: 18446744072101400062 hitcount: 6465 bytes_req: 258600
+ call_site: 18446744071582063959 hitcount: 7818 bytes_req: 15636
+ call_site: 18446744072102968908 hitcount: 9315 bytes_req: 8912400
+ call_site: 18446744072103122419 hitcount: 9315 bytes_req: 819720
+ call_site: 18446744072101402850 hitcount: 10240 bytes_req: 573440
+ call_site: 18446744072099471334 hitcount: 23820 bytes_req: 476512
+
+ Totals:
+ Hits: 109976
+ Entries: 76
+ Dropped: 0
+
+ The output displays a line for each entry, beginning with the key
+ specified in the trigger, followed by the value(s) also specified in
+ the trigger. At the beginning of the output is a line that displays
+ the trigger info, which can also be displayed by reading the
+ 'trigger' file:
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+ hist:keys=call_site:vals=bytes_req:sort=hitcount:size=2048 [active]
+
+ At the end of the output are a few lines that display the overall
+ totals for the run. The 'Hits' field shows the total number of
+ times the event trigger was hit, the 'Entries' fields shows the
+ total number of used entries in the hash table, and the 'Dropped'
+ field shows the number of hits that were dropped because the number
+ of used entries for the run exceeded the maximum number of entries
+ allowed for the table (normally 0, but if not a hint that you may
+ want to increase the size of the table using the 'size' param).
+
+ Notice in the above output that there's an extra field, 'hitcount',
+ that wasn't specified in the trigger. Also notice that in the
+ trigger info a param,'sort=hitcount', which wasn't specified in the
+ trigger either. The reason is that every trigger implicitly keeps a
+ count of the total number of hits attributed to a given entry,
+ called the 'hitcount', and that in the absence of a user-specified
+ sort param, the hitcount is used as the default sort field.
+
+ The value 'hitcount' can be used in place of an explicit value in
+ the 'values' param if you don't really need to have any particular
+ field summed and are mainly interested in hit frequencies.
+
+ To turn the hist trigger off, simply call up the trigger in command
+ history and re-execute it with a '!' prepended:
+
+ # echo '!hist:key=call_site:val=bytes_req' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ Finally, notice that the call_site as displayed in the output above
+ isn't really very useful. It's an address, but normally addresses
+ are displayed in hex. To have a numeric field displayed as hex
+ values, simply append '.hex' to the field name in the trigger:
+
+ # echo 'hist:key=call_site.hex:val=bytes_req' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
+ trigger info: hist:keys=call_site.hex:vals=bytes_req:sort=hitcount:size=2048 [active]
+
+ call_site: ffffffffa04e85cb hitcount: 1 bytes_req: 511
+ call_site: ffffffff810a66a9 hitcount: 1 bytes_req: 1024
+ call_site: ffffffff8152db82 hitcount: 1 bytes_req: 8
+ call_site: ffffffffa008829e hitcount: 1 bytes_req: 7
+ call_site: ffffffffa0087d6a hitcount: 1 bytes_req: 7
+ call_site: ffffffffa02eb7e1 hitcount: 1 bytes_req: 433
+ call_site: ffffffff8152cbde hitcount: 1 bytes_req: 192
+ call_site: ffffffff811a2602 hitcount: 2 bytes_req: 32768
+ .
+ .
+ .
+ call_site: ffffffffa0419062 hitcount: 724 bytes_req: 75296
+ call_site: ffffffffa024fc32 hitcount: 724 bytes_req: 63712
+ call_site: ffffffffa03d6468 hitcount: 852 bytes_req: 416736
+ call_site: ffffffffa03d6498 hitcount: 944 bytes_req: 70984
+ call_site: ffffffff81311d57 hitcount: 961 bytes_req: 1922
+ call_site: ffffffffa0258dfe hitcount: 1429 bytes_req: 57160
+ call_site: ffffffffa03fd5f3 hitcount: 1796 bytes_req: 158048
+ call_site: ffffffffa03d7e4c hitcount: 1796 bytes_req: 2206848
+ call_site: ffffffffa02598e2 hitcount: 3196 bytes_req: 178976
+ call_site: ffffffffa0081fe6 hitcount: 9529 bytes_req: 190584
+
+ Totals:
+ Hits: 24542
+ Entries: 60
+ Dropped: 0
+
+ Even that's only marginally more useful - while hex values do look
+ more like addresses, what users are typically more interested in
+ when looking at text addresses are the corresponding symbols
+ instead. To have an address displayed as symbolic value instead,
+ simply append '.sym' to the field name in the trigger:
+
+ # echo 'hist:key=call_site.sym:val=bytes_req' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
+ trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=hitcount:size=2048 [active]
+
+ call_site: [ffffffff8152db82] usb_control_msg hitcount: 1 bytes_req: 8
+ call_site: [ffffffffa04e85cb] __ieee80211_start_scan hitcount: 1 bytes_req: 511
+ call_site: [ffffffff815d7f6b] sk_prot_alloc hitcount: 1 bytes_req: 976
+ call_site: [ffffffff8152cbde] usb_alloc_urb hitcount: 1 bytes_req: 192
+ call_site: [ffffffff815d2c1b] sock_alloc_inode hitcount: 1 bytes_req: 64
+ call_site: [ffffffffa02eb7e1] nl80211_trigger_scan hitcount: 1 bytes_req: 433
+ call_site: [ffffffffa0087d6a] hidraw_report_event hitcount: 1 bytes_req: 7
+ call_site: [ffffffff810a66a9] syslog_print_all hitcount: 1 bytes_req: 1024
+ call_site: [ffffffffa008829e] hidraw_send_report hitcount: 1 bytes_req: 7
+ call_site: [ffffffff811f1316] mounts_open_common hitcount: 2 bytes_req: 368
+ call_site: [ffffffff811d0825] alloc_fdtable hitcount: 2 bytes_req: 96
+ call_site: [ffffffffa042106f] __intel_framebuffer_create hitcount: 3 bytes_req: 432
+ .
+ .
+ .
+ call_site: [ffffffff8136b079] sg_kmalloc hitcount: 461 bytes_req: 153696
+ call_site: [ffffffffa03dbc5e] i915_gem_obj_lookup_or_create_vma hitcount: 462 bytes_req: 92400
+ call_site: [ffffffffa03e213a] i915_gem_object_get_pages_gtt hitcount: 462 bytes_req: 7392
+ call_site: [ffffffffa03d6498] i915_gem_do_execbuffer.isra.22 hitcount: 697 bytes_req: 52064
+ call_site: [ffffffff81311d57] apparmor_file_alloc_security hitcount: 926 bytes_req: 1852
+ call_site: [ffffffffa0258dfe] drm_vma_node_allow hitcount: 1050 bytes_req: 42000
+ call_site: [ffffffffa03fd5f3] intel_ring_begin hitcount: 1116 bytes_req: 98208
+ call_site: [ffffffffa03d7e4c] i915_gem_execbuffer2 hitcount: 1116 bytes_req: 956984
+ call_site: [ffffffffa02598e2] drm_modeset_lock_crtc hitcount: 2055 bytes_req: 115080
+ call_site: [ffffffffa0081fe6] hid_report_raw_event hitcount: 6433 bytes_req: 128664
+
+ Totals:
+ Hits: 17384
+ Entries: 62
+ Dropped: 0
+
+ Because the default sort key above is 'hitcount', the above shows a
+ the list of call_sites by increasing hitcount, so that at the bottom
+ we see the functions that made the most kmalloc calls during the
+ run. If instead we we wanted to see the top kmalloc callers in
+ terms of the number of bytes requested rather than the number of
+ calls, and we wanted the top caller to appear at the top, we can use
+ the 'sort' param, along with the 'descending' modifier:
+
+ # echo 'hist:key=call_site.sym:val=bytes_req:sort=bytes_req.descending' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
+ trigger info: hist:keys=call_site.sym:vals=bytes_req:sort=bytes_req:size=2048 [active]
+
+ call_site: [ffffffffa03d7e4c] i915_gem_execbuffer2 hitcount: 682 bytes_req: 766360
+ call_site: [ffffffff811d6c3b] seq_buf_alloc hitcount: 46 bytes_req: 186176
+ call_site: [ffffffffa03d6468] i915_gem_do_execbuffer.isra.22 hitcount: 377 bytes_req: 142808
+ call_site: [ffffffffa02598e2] drm_modeset_lock_crtc hitcount: 1107 bytes_req: 61992
+ call_site: [ffffffffa03fd5f3] intel_ring_begin hitcount: 682 bytes_req: 60016
+ call_site: [ffffffffa0081fe6] hid_report_raw_event hitcount: 2993 bytes_req: 59864
+ call_site: [ffffffff8136b079] sg_kmalloc hitcount: 206 bytes_req: 48096
+ call_site: [ffffffffa03dbc5e] i915_gem_obj_lookup_or_create_vma hitcount: 206 bytes_req: 41200
+ call_site: [ffffffff811a2602] __kmalloc hitcount: 2 bytes_req: 32768
+ call_site: [ffffffffa0419062] intel_crtc_page_flip hitcount: 305 bytes_req: 31720
+ call_site: [ffffffffa024fc32] drm_mode_page_flip_ioctl hitcount: 305 bytes_req: 26840
+ call_site: [ffffffffa0258dfe] drm_vma_node_allow hitcount: 603 bytes_req: 24120
+ call_site: [ffffffffa03d6498] i915_gem_do_execbuffer.isra.22 hitcount: 305 bytes_req: 23072
+ call_site: [ffffffff812775da] ext4_find_extent hitcount: 79 bytes_req: 7584
+ .
+ .
+ .
+ call_site: [ffffffff811f3b07] inotify_handle_event hitcount: 2 bytes_req: 96
+ call_site: [ffffffff811bfb6e] vfs_rename hitcount: 6 bytes_req: 66
+ call_site: [ffffffff815d2c1b] sock_alloc_inode hitcount: 1 bytes_req: 64
+ call_site: [ffffffff811d0825] alloc_fdtable hitcount: 1 bytes_req: 48
+ call_site: [ffffffff81220376] proc_self_follow_link hitcount: 2 bytes_req: 24
+ call_site: [ffffffff8112e1ec] hist_show hitcount: 2 bytes_req: 16
+ call_site: [ffffffff8112e1d0] hist_show hitcount: 2 bytes_req: 16
+ call_site: [ffffffff8152db82] usb_control_msg hitcount: 1 bytes_req: 8
+ call_site: [ffffffffa008829e] hidraw_send_report hitcount: 1 bytes_req: 7
+ call_site: [ffffffffa0087d6a] hidraw_report_event hitcount: 1 bytes_req: 7
+
+ Totals:
+ Hits: 9331
+ Entries: 57
+ Dropped: 0
+
+ We can also add multiple fields to the 'values' param. For example,
+ we might want to see the total number of bytes allocated alongside
+ bytes requested, and display the result sorted by bytes allocated in
+ a descending order:
+
+ # echo 'hist:keys=call_site.sym:values=bytes_req,bytes_alloc:sort=bytes_alloc.descending' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
+ trigger info: hist:keys=call_site.sym:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active]
+
+ call_site: [ffffffffa03d7e4c] i915_gem_execbuffer2 hitcount: 4427 bytes_req: 4190032 bytes_alloc: 5967744
+ call_site: [ffffffff811d6c3b] seq_buf_alloc hitcount: 427 bytes_req: 1706768 bytes_alloc: 1765376
+ call_site: [ffffffffa03d6468] i915_gem_do_execbuffer.isra.22 hitcount: 1978 bytes_req: 765936 bytes_alloc: 1434304
+ call_site: [ffffffffa03fd5f3] intel_ring_begin hitcount: 4427 bytes_req: 389576 bytes_alloc: 424992
+ call_site: [ffffffff8136b079] sg_kmalloc hitcount: 993 bytes_req: 305824 bytes_alloc: 376576
+ call_site: [ffffffffa02598e2] drm_modeset_lock_crtc hitcount: 4319 bytes_req: 241864 bytes_alloc: 276416
+ call_site: [ffffffffa03dbc5e] i915_gem_obj_lookup_or_create_vma hitcount: 993 bytes_req: 198600 bytes_alloc: 254208
+ call_site: [ffffffffa0419062] intel_crtc_page_flip hitcount: 1806 bytes_req: 187824 bytes_alloc: 231168
+ call_site: [ffffffff81424d78] __tty_buffer_request_room hitcount: 61 bytes_req: 128928 bytes_alloc: 229376
+ call_site: [ffffffffa0081fe6] hid_report_raw_event hitcount: 9550 bytes_req: 191024 bytes_alloc: 210160
+ call_site: [ffffffffa03d6498] i915_gem_do_execbuffer.isra.22 hitcount: 2449 bytes_req: 184720 bytes_alloc: 209216
+ call_site: [ffffffffa0258dfe] drm_vma_node_allow hitcount: 3256 bytes_req: 130240 bytes_alloc: 208384
+ .
+ .
+ .
+ call_site: [ffffffffa04e85cb] __ieee80211_start_scan hitcount: 3 bytes_req: 1533 bytes_alloc: 1536
+ call_site: [ffffffffa02eb7e1] nl80211_trigger_scan hitcount: 3 bytes_req: 1299 bytes_alloc: 1536
+ call_site: [ffffffff8152cbde] usb_alloc_urb hitcount: 6 bytes_req: 1152 bytes_alloc: 1152
+ call_site: [ffffffff811f3b07] inotify_handle_event hitcount: 14 bytes_req: 744 bytes_alloc: 896
+ call_site: [ffffffff81206b81] load_elf_binary hitcount: 24 bytes_req: 672 bytes_alloc: 768
+ call_site: [ffffffff811bfb6e] vfs_rename hitcount: 38 bytes_req: 436 bytes_alloc: 640
+ call_site: [ffffffff81220376] proc_self_follow_link hitcount: 24 bytes_req: 288 bytes_alloc: 384
+ call_site: [ffffffff815d2c1b] sock_alloc_inode hitcount: 5 bytes_req: 320 bytes_alloc: 320
+ call_site: [ffffffff811d0825] alloc_fdtable hitcount: 4 bytes_req: 192 bytes_alloc: 256
+ call_site: [ffffffffa04ec05f] ieee80211_start_tx_ba_session hitcount: 1 bytes_req: 232 bytes_alloc: 256
+ call_site: [ffffffffa0244701] drm_vm_open_locked hitcount: 5 bytes_req: 160 bytes_alloc: 160
+ call_site: [ffffffff8112e1d0] hist_show hitcount: 10 bytes_req: 80 bytes_alloc: 80
+ call_site: [ffffffff8112e1ec] hist_show hitcount: 10 bytes_req: 80 bytes_alloc: 80
+ call_site: [ffffffff81074c9e] kthread_create_on_node hitcount: 1 bytes_req: 56 bytes_alloc: 64
+ call_site: [ffffffff8152db82] usb_control_msg hitcount: 6 bytes_req: 48 bytes_alloc: 48
+ call_site: [ffffffffa008829e] hidraw_send_report hitcount: 6 bytes_req: 42 bytes_alloc: 48
+ call_site: [ffffffffa0087d6a] hidraw_report_event hitcount: 6 bytes_req: 42 bytes_alloc: 48
+
+ Totals:
+ Hits: 45576
+ Entries: 66
+ Dropped: 0
+
+ Finally, to finish off our kmalloc example, instead of simply having
+ the hist trigger display symbolic call_sites, we can have the hist
+ trigger additionally display the complete set of kernel stack traces
+ that led to each call_sites. To do that, we simply use the special
+ value 'stacktrace' for the key param:
+
+ # echo 'hist:keys=stacktrace:values=bytes_req,bytes_alloc:sort=bytes_alloc' > \
+ /sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
+
+ The above trigger will use the kernel stack trace in effect when an
+ event is triggered as the key for the hash table. This allows the
+ enumeration of every kernel callpath that led up to a particular
+ event, along with a running total of any of the event fields for
+ that event. Here we tally bytes requested and bytes allocated for
+ every callpath in the system that led up to a kmalloc (in this case
+ every callpath to a kmalloc for a kernel compile):
+
+ # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/hist
+ trigger info: hist:keys=stacktrace:vals=bytes_req,bytes_alloc:sort=bytes_alloc:size=2048 [active]
+
+ stacktrace:
+ __kmalloc_track_caller+0x10b/0x190
+ kstrdup+0x35/0x70
+ __kernfs_new_node+0x34/0xf0
+ kernfs_new_node+0x26/0x50
+ kernfs_create_dir_ns+0x29/0x80
+ sysfs_create_dir_ns+0x40/0xa0
+ kobject_add_internal+0xbd/0x3a0
+ kobject_add+0x60/0xb0
+ device_add+0xff/0x5a0
+ device_register+0x1e/0x30
+ usb_create_ep_devs+0x81/0xd0
+ usb_new_device+0x357/0x410
+ hub_event+0xd68/0x11a0
+ process_one_work+0x149/0x3d0
+ worker_thread+0x121/0x4a0
+ kthread+0xd2/0xf0
+ hitcount: 1 bytes_req: 6 bytes_alloc: 8
+ stacktrace:
+ kmem_cache_alloc_trace+0xfb/0x160
+ usb_control_msg+0x42/0x110
+ hub_port_status+0x84/0x120
+ hub_port_reset+0x263/0x430
+ hub_port_init+0x77/0xbb0
+ hub_event+0x944/0x11a0
+ process_one_work+0x149/0x3d0
+ worker_thread+0x121/0x4a0
+ kthread+0xd2/0xf0
+ ret_from_fork+0x7c/0xb0
+ hitcount: 1 bytes_req: 8 bytes_alloc: 8
+ .
+ .
+ .
+ stacktrace:
+ __kmalloc+0x11b/0x1a0
+ load_elf_phdrs+0x76/0xa0
+ load_elf_binary+0x102/0x1780
+ search_binary_handler+0x97/0x1d0
+ do_execveat_common.isra.32+0x551/0x6e0
+ SyS_execve+0x3a/0x50
+ stub_execve+0x69/0xa0
+ hitcount: 131791 bytes_req: 67822944 bytes_alloc: 80279552
+ stacktrace:
+ __kmalloc+0x11b/0x1a0
+ i915_gem_execbuffer2+0x6c/0x2c0 [i915]
+ drm_ioctl+0x1a4/0x630 [drm]
+ do_vfs_ioctl+0x2f0/0x4f0
+ SyS_ioctl+0x81/0xa0
+ system_call_fastpath+0x12/0x17
+ hitcount: 164540 bytes_req: 134464456 bytes_alloc: 167693376
+ stacktrace:
+ kmem_cache_alloc_trace+0xfb/0x160
+ apparmor_file_alloc_security+0x27/0x40
+ security_file_alloc+0x16/0x20
+ get_empty_filp+0x93/0x1c0
+ path_openat+0x31/0x620
+ do_filp_open+0x3a/0x90
+ do_sys_open+0x128/0x220
+ SyS_open+0x1e/0x20
+ system_call_fastpath+0x12/0x17
+ hitcount: 22118413 bytes_req: 44236888 bytes_alloc: 176947328
+ stacktrace:
+ __kmalloc+0x11b/0x1a0
+ seq_buf_alloc+0x1b/0x50
+ seq_read+0x2cc/0x370
+ proc_reg_read+0x3d/0x80
+ __vfs_read+0x18/0x50
+ vfs_read+0x86/0x140
+ SyS_read+0x46/0xb0
+ system_call_fastpath+0x12/0x17
+ hitcount: 66168 bytes_req: 271015944 bytes_alloc: 271015952
+
+ Totals:
+ Hits: 26780801
+ Entries: 633
+ Dropped: 0
+
+ If you key a hist trigger on pid, for example, to gather and display
+ sorted totals for each process, you can use the special .execname
+ modifier to display the executable names for the processes in the
+ table rather than raw pids. The example below keeps a per-process
+ sum of total bytes read:
+
+ # echo 'hist:key=common_pid.execname:val=count:sort=count.descending' > \
+ /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
+
+ # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/hist
+ trigger info: hist:keys=common_pid.execname:vals=count:sort=count:size=2048 [active]
+
+ common_pid: Xorg [18446612132314219843] hitcount: 166 count: 159744
+ common_pid: gnome-terminal [18446612132314221635] hitcount: 44 count: 113321
+ common_pid: compiz [18446612132314221491] hitcount: 13 count: 81968
+ common_pid: bash [18446612132314231931] hitcount: 3 count: 66369
+ common_pid: gdbus [18446612132314221135] hitcount: 91 count: 1456
+ common_pid: acpid [18446612132314219701] hitcount: 48 count: 1152
+ common_pid: ibus-daemon [18446612132314221129] hitcount: 71 count: 1136
+ common_pid: gdbus [18446612132314221638] hitcount: 42 count: 672
+ common_pid: gdbus [18446612132314221345] hitcount: 42 count: 672
+ common_pid: ibus-engine-sim [18446612132314221344] hitcount: 14 count: 224
+ common_pid: gdbus [18446612132314221262] hitcount: 12 count: 192
+ common_pid: gdbus [18446612132314221249] hitcount: 6 count: 96
+ common_pid: ibus-ui-gtk3 [18446612132314221245] hitcount: 6 count: 96
+ common_pid: gdbus [18446612132314221504] hitcount: 5 count: 80
+ common_pid: unity-panel-ser [18446612132314221243] hitcount: 3 count: 48
+ common_pid: postgres [18446612132314220339] hitcount: 1 count: 16
+ common_pid: gdbus [18446612132314221395] hitcount: 1 count: 16
+ common_pid: rtkit-daemon [18446612132314220509] hitcount: 1 count: 8
+ common_pid: upstart-dbus-br [18446612132314221100] hitcount: 4 count: 4
+ common_pid: bash [18446612132314221924] hitcount: 4 count: 4
+ common_pid: firefox [18446612132314222258] hitcount: 1 count: 1
+
+ Totals:
+ Hits: 578
+ Entries: 21
+ Dropped: 0
+
+
+ Similarly, if you key a hist trigger on syscall id, for example to
+ gather and display a list of systemwide syscall hits, you can use
+ the special .syscall modifier to display the syscall names rather
+ than raw ids. The example below keeps a running total of syscall
+ counts for the system during the run:
+
+ # echo 'hist:key=id.syscall:val=hitcount' > \
+ /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger
+
+ # cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist
+ trigger info: hist:keys=id.syscall:vals=:sort=hitcount:size=2048 [active]
+
+ id: sys_fadvise64 hitcount: 1
+ id: sys_bind hitcount: 1
+ id: sys_fsync hitcount: 1
+ id: sys_getsockname hitcount: 1
+ id: sys_readlink hitcount: 3
+ id: sys_set_tid_address hitcount: 4
+ .
+ .
+ .
+ id: sys_mprotect hitcount: 57
+ id: sys_waitid hitcount: 64
+ id: sys_rt_sigaction hitcount: 121
+ id: sys_inotify_add_watch hitcount: 234
+ id: sys_mmap hitcount: 258
+ id: sys_nanosleep hitcount: 310
+ id: sys_close hitcount: 323
+ id: sys_rt_sigprocmask hitcount: 3520
+ id: sys_futex hitcount: 4819
+ id: sys_write hitcount: 8907
+ id: sys_setitimer hitcount: 39039
+ id: sys_writev hitcount: 64528
+ id: sys_poll hitcount: 77493
+ id: sys_recvmsg hitcount: 168240
+ id: sys_ioctl hitcount: 448946
+
+ Totals:
+ Hits: 869971
+ Entries: 75
+ Dropped: 0
+
+ The next example uses a string field as the hash key and
+ demonstrates how you can manually pause and continue a hist trigger.
+ In this example, we'll aggregate fork counts and don't expect a
+ large number of entries in the hash table, so we'll drop it to a
+ much smaller number, say 256:
+
+ # echo 'hist:key=child_comm:val=hitcount:size=256' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
+ trigger info: hist:keys=child_comm:vals=:sort=hitcount:size=256 [active]
+
+ child_comm: gdbus hitcount: 1
+ child_comm: smbd hitcount: 1
+ child_comm: ibus-daemon hitcount: 1
+ child_comm: Cache2 I/O hitcount: 1
+ child_comm: dconf worker hitcount: 1
+ child_comm: postgres hitcount: 1
+ child_comm: unity-panel-ser hitcount: 2
+ child_comm: compiz hitcount: 3
+ child_comm: bash hitcount: 3
+ child_comm: firefox hitcount: 60
+
+ Totals:
+ Hits: 74
+ Entries: 10
+ Dropped: 0
+
+ # echo 'hist:key=child_comm:val=hitcount:size=256:pause' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
+
+ If we want to pause the hist trigger, we can simply append :pause to
+ the command that started the trigger. Notice that the trigger info
+ displays as [paused]:
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
+ trigger info: hist:keys=child_comm:vals=:sort=hitcount:size=256 [paused]
+
+ child_comm: dconf worker hitcount: 1
+ child_comm: gdbus hitcount: 1
+ child_comm: Cache2 I/O hitcount: 1
+ child_comm: ibus-daemon hitcount: 1
+ child_comm: unity-panel-ser hitcount: 2
+ child_comm: smbd hitcount: 2
+ child_comm: postgres hitcount: 3
+ child_comm: compiz hitcount: 3
+ child_comm: bash hitcount: 3
+ child_comm: emacs hitcount: 8
+ child_comm: firefox hitcount: 60
+
+ Totals:
+ Hits: 85
+ Entries: 11
+ Dropped: 0
+
+ To manually continue having the trigger aggregate events, append
+ :cont instead. Notice that the trigger info displays as [active]
+ again, and the data has changed:
+
+ # echo 'hist:key=child_comm:val=hitcount:size=256:cont' >
+ /sys/kernel/debug/tracing/events/sched/sched_process_fork/trigger
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_process_fork/hist
+ trigger info: hist:keys=child_comm:vals=:sort=hitcount:size=256 [active]
+
+ child_comm: Cache2 I/O hitcount: 1
+ child_comm: ibus-daemon hitcount: 1
+ child_comm: kthreadd hitcount: 1
+ child_comm: gdbus hitcount: 1
+ child_comm: dconf worker hitcount: 1
+ child_comm: unity-panel-ser hitcount: 2
+ child_comm: smbd hitcount: 2
+ child_comm: compiz hitcount: 3
+ child_comm: postgres hitcount: 4
+ child_comm: bash hitcount: 4
+ child_comm: emacs hitcount: 8
+ child_comm: firefox hitcount: 60
+
+ Totals:
+ Hits: 88
+ Entries: 12
+ Dropped: 0
+
+ The previous example showed how to start and stop a hist trigger by
+ appending 'pause' and 'continue' to the hist trigger command. A
+ hist trigger can also be started in a paused state by initially
+ starting the trigger with ':pause' appended. This allows you to
+ start the trigger only when you're ready to start collecting data
+ and not before. For example, start the trigger in a paused state,
+ then unpause it and do something you want to measure, then pause the
+ trigger when done.
+
+ Of course, doing this manually can be difficult and error-prone, but
+ it is possible to automatically start and stop a hist trigger based
+ on some condition, via the enable_hist and disable_hist triggers.
+
+ For example, suppose we wanted to take a look at the relative
+ weights in terms of skb length for each callpath that leads to a
+ netif_receieve_skb event when downloading a decent-sized file using
+ wget.
+
+ First we set up an initially paused stacktrace trigger on the
+ netif_receive_skb event:
+
+ # echo 'hist:key=stacktrace:vals=len:pause' > \
+ /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
+
+ Next, we set up an 'enable_hist' trigger on the sched_process_exec
+ event, with an 'if filename==/usr/bin/wget' filter. The effect of
+ this new trigger is that it will 'unpause' the hist trigger we just
+ set up on netif_receive_skb if and only if it sees a
+ sched_proces_exec event with a filename of '/usr/bin/wget'. When
+ that happens, all netif_receive_skb events are aggregated into a
+ hash table keyed on stacktrace:
+
+ # echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
+
+ The aggregation continues until the netif_receive_skb is paused
+ again, which is what the following disable_hist event does by
+ creating a similar setup on the sched_process_exit event, using the
+ filter 'comm==wget':
+
+ # echo 'disable_hist:net:netif_receive_skb if comm==wget' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
+
+ Whenever a process exits and the comm field of the disable_hist
+ trigger filter matches 'comm==wget', the netif_receive_skb hist
+ trigger is disabled.
+
+ The overall effect is that netif_received_skb events are aggregated
+ into the hash table for only the duration of the wget. Executing a
+ wget command and then listing the 'hist' file will display the
+ output generated by the wget command:
+
+ $ wget https://www.kernel.org/pub/linux/kernel/v3.x/patch-3.19.xz
+
+ # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
+ trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
+
+ stacktrace:
+ __netif_receive_skb_core+0x4ad/0x780
+ __netif_receive_skb+0x18/0x60
+ process_backlog+0xa8/0x150
+ net_rx_action+0x15d/0x340
+ __do_softirq+0x114/0x2c0
+ do_softirq_own_stack+0x1c/0x30
+ do_softirq+0x65/0x70
+ __local_bh_enable_ip+0xb5/0xc0
+ ip_finish_output+0x1f4/0x810
+ ip_output+0x68/0xa0
+ ip_local_out_sk+0x30/0x40
+ ip_send_skb+0x1a/0x50
+ udp_send_skb+0x173/0x2a0
+ udp_sendmsg+0x2bf/0x9f0
+ inet_sendmsg+0x63/0xb0
+ do_sock_sendmsg+0x8c/0x100
+ hitcount: 4 len: 388
+ stacktrace:
+ __netif_receive_skb_core+0x4ad/0x780
+ __netif_receive_skb+0x18/0x60
+ netif_receive_skb_internal+0x23/0x90
+ napi_gro_receive+0xc8/0x100
+ ieee80211_deliver_skb+0xba/0x250 [mac80211]
+ ieee80211_rx_handlers+0xcc5/0x2240 [mac80211]
+ ieee80211_prepare_and_rx_handle+0x4e7/0xc30 [mac80211]
+ ieee80211_rx+0x31d/0x920 [mac80211]
+ iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm]
+ iwl_rx_dispatch+0x8e/0xf0 [iwldvm]
+ iwl_pcie_irq_handler+0xe1c/0x12c0 [iwlwifi]
+ irq_thread_fn+0x20/0x50
+ irq_thread+0x11f/0x150
+ kthread+0xd2/0xf0
+ ret_from_fork+0x7c/0xb0
+ hitcount: 35 len: 17415
+ stacktrace:
+ __netif_receive_skb_core+0x4ad/0x780
+ __netif_receive_skb+0x18/0x60
+ netif_receive_skb_internal+0x23/0x90
+ napi_gro_complete+0xa4/0xe0
+ dev_gro_receive+0x233/0x360
+ napi_gro_receive+0x30/0x100
+ ieee80211_deliver_skb+0xba/0x250 [mac80211]
+ ieee80211_rx_handlers+0xcc5/0x2240 [mac80211]
+ ieee80211_prepare_and_rx_handle+0x4e7/0xc30 [mac80211]
+ ieee80211_rx+0x31d/0x920 [mac80211]
+ iwlagn_rx_reply_rx+0x3db/0x6f0 [iwldvm]
+ iwl_rx_dispatch+0x8e/0xf0 [iwldvm]
+ iwl_pcie_irq_handler+0xe1c/0x12c0 [iwlwifi]
+ irq_thread_fn+0x20/0x50
+ irq_thread+0x11f/0x150
+ kthread+0xd2/0xf0
+ hitcount: 155 len: 636342
+ stacktrace:
+ __netif_receive_skb_core+0x4ad/0x780
+ __netif_receive_skb+0x18/0x60
+ netif_receive_skb_internal+0x23/0x90
+ napi_gro_complete+0xa4/0xe0
+ napi_gro_flush+0x6d/0x90
+ iwl_pcie_irq_handler+0x90f/0x12c0 [iwlwifi]
+ irq_thread_fn+0x20/0x50
+ irq_thread+0x11f/0x150
+ kthread+0xd2/0xf0
+ ret_from_fork+0x7c/0xb0
+ hitcount: 1013 len: 5531908
+
+ Totals:
+ Hits: 1207
+ Entries: 4
+ Dropped: 0
+
+ The above shows all the netif_receive_skb callpaths and their total
+ lengths for the duration of the wget command.
+
+ The 'clear' hist trigger param can be used to clear the hash table.
+ Suppose we wanted to try another run of the previous example but
+ this time also wanted to see the complete list of events that went
+ into the histogram. In order to avoid having to set everything up
+ again, we can just clear the histogram first:
+
+ # echo 'hist:key=stacktrace:vals=len:clear' > \
+ /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
+
+ Just to verify that it is in fact cleared, here's what we now see in
+ the hist file:
+
+ # cat /sys/kernel/debug/tracing/events/net/netif_receive_skb/hist
+ trigger info: hist:keys=stacktrace:vals=len:sort=hitcount:size=2048 [paused]
+
+ Totals:
+ Hits: 0
+ Entries: 0
+ Dropped: 0
+
+ Since we want to see the detailed list of every netif_receive_skb
+ event occurring during the new run, which are in fact same events
+ being aggregated into the hash table, we add some additional
+ 'enable_event' events the triggering sched_process_exec and
+ sched_process_exit events as such:
+
+ # echo 'enable_event:net:netif_receive_skb if filename==/usr/bin/wget' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
+
+ # echo 'disable_event:net:netif_receive_skb if comm==wget' > \
+ /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
+
+ If you read the trigger files for the sched_process_exec and
+ sched_process_exit triggers, you should see two triggers for each:
+ one enabling/disabling the hist aggregation and the other
+ enabling/disabling the logging of events:
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
+ enable_event:net:netif_receive_skb:unlimited if filename==/usr/bin/wget
+ enable_hist:net:netif_receive_skb:unlimited if filename==/usr/bin/wget
+
+ # cat /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger
+ enable_event:net:netif_receive_skb:unlimited if comm==wget
+ disable_hist:net:netif_receive_skb:unlimited if comm==wget
+
+ In other words, whever either of the sched_process_exec or
+ sched_process_exit events is hit and matches 'wget', it enables or
+ disables both the histogram and the event log, and what you end up
+ with is a hash table and set of events just covering the specified
+ duration.
+
+ Displaying the 'hist' file should show something similar to what you
+ saw in the last run, but this time you should also see the
+ individual events in the trace file:
+
+ # cat /sys/kernel/debug/tracing/trace
+ # tracer: nop
+ #
+ # entries-in-buffer/entries-written: 183/1426 #P:4
+ #
+ # _-----=> irqs-off
+ # / _----=> need-resched
+ # | / _---=> hardirq/softirq
+ # || / _--=> preempt-depth
+ # ||| / delay
+ # TASK-PID CPU# |||| TIMESTAMP FUNCTION
+ # | | | |||| | |
+ wget-15108 [000] ..s1 31769.606929: netif_receive_skb: dev=lo skbaddr=ffff88009c353100 len=60
+ wget-15108 [000] ..s1 31769.606999: netif_receive_skb: dev=lo skbaddr=ffff88009c353200 len=60
+ dnsmasq-1382 [000] ..s1 31769.677652: netif_receive_skb: dev=lo skbaddr=ffff88009c352b00 len=130
+ dnsmasq-1382 [000] ..s1 31769.685917: netif_receive_skb: dev=lo skbaddr=ffff88009c352200 len=138
+ ##### CPU 2 buffer started ####
+ irq/29-iwlwifi-559 [002] ..s. 31772.031529: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433d00 len=2948
+ irq/29-iwlwifi-559 [002] ..s. 31772.031572: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432200 len=1500
+ irq/29-iwlwifi-559 [002] ..s. 31772.032196: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433100 len=2948
+ irq/29-iwlwifi-559 [002] ..s. 31772.032761: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d433000 len=2948
+ irq/29-iwlwifi-559 [002] ..s. 31772.033220: netif_receive_skb: dev=wlan0 skbaddr=ffff88009d432e00 len=1500
+ .
+ .
+ .
--
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/