Re: latency histogram with BPF

From: Daniel Wagner
Date: Fri Jun 12 2015 - 02:12:27 EST


On 06/12/2015 12:08 AM, Alexei Starovoitov wrote:
> On 6/11/15 12:25 AM, Daniel Wagner wrote:
>> In both cases BPF or based on Tom's 'hist' triggers' patches, there is
>> some trickery necessary to get it working. While the first approach
>> has more flexibility what you want to measure or how you want to
>> present it, I suspect it will be harder to get it working/accepted.
>
> why? Out of your patch only 2 lines are for kernel, which I wanted
> to add anyway. The sample code also looks good.

I had the impression that attaching kprobes to trace_preempt_[on|off] is
not going to work. But you seem confident to get this working :)

>> Obviously, adding kprobes to trace_preempt_[on|off] is
>> asking for problems:
>
> thanks for reporting.
> I reproduced it with hackbench, but my stack trace looks completely
> different. Looks like some memory corruption is happening.
> Not clear where. I'm still debugging.

I have noticed that also network load triggers a panic. Most of the time
softirq is involved somehow. So far I haven't spend time looking into it
because I though that can't work anyway. I'll stare a bit more at it now.

>> CPU 3
>> latency : count distribution
>> 1 -> 1 : 0
>> | |
>> 2 -> 3 : 0
>> | |
>> 4 -> 7 : 0
>> | |
>> 8 -> 15 : 25182
>> |************************************************* |
>> 16 -> 31 : 1675
>> |** |
>> 32 -> 63 : 0
>> | |
>> 64 -> 127 : 0
>> | |
>> 128 -> 255 : 0
>> | |
>> 256 -> 511 : 0
>> | |
>> 512 -> 1023 : 0
>> | |
>>
>> The numbers look a bit too nice and low. I suspect something is going
>> wrong.
>
> I see similar numbers. 25k over 5 sec = 5k preempt on/off per second
> which sounds about right for idle.

I expected to see some higher latencies it stays pretty low the whole
time. With Tom patches I got a histogram which looked like this:

https://lkml.org/lkml/2015/4/30/196

# trigger info:
hist:keys=latency.bucket:vals=:sort=latency.bucket:size=2048 [active]

latency: 0 hitcount: 558648
latency: 256 hitcount: 70163
latency: 512 hitcount: 24601
latency: 768 hitcount: 9649
latency: 1024 hitcount: 4578
latency: 1280 hitcount: 2623
latency: 1536 hitcount: 1642
latency: 1792 hitcount: 1281
latency: 2048 hitcount: 1203
latency: 2304 hitcount: 942
latency: 2560 hitcount: 713
latency: 2816 hitcount: 513
latency: 3072 hitcount: 436
latency: 3328 hitcount: 347
latency: 3584 hitcount: 312
latency: 3840 hitcount: 279
latency: 4096 hitcount: 269
latency: 4352 hitcount: 257
latency: 4608 hitcount: 236
latency: 4864 hitcount: 214
latency: 5120 hitcount: 190
latency: 5376 hitcount: 129
latency: 5632 hitcount: 159
latency: 5888 hitcount: 141
latency: 6144 hitcount: 118
latency: 6400 hitcount: 107
latency: 6656 hitcount: 99
latency: 6912 hitcount: 73
latency: 7168 hitcount: 70
latency: 7424 hitcount: 80
latency: 7680 hitcount: 58
latency: 7936 hitcount: 65
latency: 8192 hitcount: 57
latency: 8448 hitcount: 59
latency: 8704 hitcount: 67
latency: 8960 hitcount: 64
latency: 9216 hitcount: 82
latency: 9472 hitcount: 89
latency: 9728 hitcount: 74
latency: 9984 hitcount: 89
latency: 10240 hitcount: 65

This distribution seems to be consistent with the latency-hist.patch
from -rt. That patch has been used for a while and it looks like it
reports the right values.

https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v3.14-rt&id=0eba3c128347cc82fff093af1c61b2211985b4c9

>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index 2d56ce5..e4d3e76 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -172,6 +172,8 @@ static const struct bpf_func_proto
>> *kprobe_prog_func_proto(enum bpf_func_id func
>> return &bpf_probe_read_proto;
>> case BPF_FUNC_ktime_get_ns:
>> return &bpf_ktime_get_ns_proto;
>> + case BPF_FUNC_get_smp_processor_id:
>> + return &bpf_get_smp_processor_id_proto;
>
> This part I'll take into my set of patches.

Great.

>> --- a/samples/bpf/bpf_helpers.h
>> +++ b/samples/bpf/bpf_helpers.h
>> @@ -21,6 +21,8 @@ static unsigned long long (*bpf_ktime_get_ns)(void) =
>> (void *) BPF_FUNC_ktime_get_ns;
>> static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
>> (void *) BPF_FUNC_trace_printk;
>> +static unsigned int (*bpf_get_smp_processor_id)(void) =
>> + (void *) BPF_FUNC_get_smp_processor_id;
>
> this part is already there in net-next.

Ah good to know. FWIW, I had to pick some other snippets from other
trees to get it working. All those patches will hit mainline in some
near future.

>> diff --git a/samples/bpf/hwlathist_kern.c b/samples/bpf/hwlathist_kern.c
>> new file mode 100644
>> index 0000000..7a97e7e
>> --- /dev/null
>> +++ b/samples/bpf/hwlathist_kern.c
>
> looks good and useful, but I would like to find and fix the bug first
> before exposing this bit as a 'sample', since right now it's
> 'crash me if hackbench is running' test :)

Agreed. :)

> If you have any suggestions on where to look, I'm all ears.
> My stack traces look like:
> Running with 10*40 (== 400) tasks.
> [ 12.032571] kernel BUG at ../mm/slub.c:3413!
> [ 12.036004] [<ffffffff810c1913>] rcu_process_callbacks+0x283/0x680
> [ 12.036004] [<ffffffff810632d0>] __do_softirq+0xf0/0x2b0
> [ 12.036004] [<ffffffff8107dff9>] ? kthread+0xc9/0xe0
> [ 12.036004] [<ffffffff810634b1>] run_ksoftirqd+0x21/0x50
> or
> [ 25.788074] kernel tried to execute NX-protected page - exploit
> attempt? (uid: 0)
> [ 25.788801] BUG: unable to handle kernel paging request at
> ffff88000b1ea2a0
> [ 25.800085] [<ffffffff810c18ba>] ? rcu_process_callbacks+0x22a/0x680
> [ 25.800085] [<ffffffff810632d0>] __do_softirq+0xf0/0x2b0
> [ 25.800085] [<ffffffff816d6d7c>] do_softirq_own_stack+0x1c/0x30
> [ 25.800085] <EOI>
> [ 25.800085] [<ffffffff8106351d>] do_softirq+0x3d/0x40
> [ 25.800085] [<ffffffff810635ba>] __local_bh_enable_ip+0x9a/0xb0
> [ 25.800085] [<ffffffff816d4c1b>] _raw_spin_unlock_bh+0x1b/0x20
> [ 25.800085] [<ffffffff811f3e42>] bdi_writeback_workfn+0x1b2/0x470

I see crashes as well with do_softirq() and RCU in it as well.

cheers,
daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/