Re: [PATCH bpf-next v2] samples/bpf: Add program for CPU state statistics

From: Daniel Borkmann
Date: Mon Feb 26 2018 - 05:27:03 EST


On 02/26/2018 02:19 AM, Leo Yan wrote:
> CPU is active when have running tasks on it and CPUFreq governor can
> select different operating points (OPP) according to different workload;
> we use 'pstate' to present CPU state which have running tasks with one
> specific OPP. On the other hand, CPU is idle which only idle task on
> it, CPUIdle governor can select one specific idle state to power off
> hardware logics; we use 'cstate' to present CPU idle state.
>
> Based on trace events 'cpu_idle' and 'cpu_frequency' we can accomplish
> the duration statistics for every state. Every time when CPU enters
> into or exits from idle states, the trace event 'cpu_idle' is recorded;
> trace event 'cpu_frequency' records the event for CPU OPP changing, so
> it's easily to know how long time the CPU stays in the specified OPP,
> and the CPU must be not in any idle state.
>
> This patch is to utilize the mentioned trace events for pstate and
> cstate statistics. To achieve more accurate profiling data, the program
> uses below sequence to insure CPU running/idle time aren't missed:
>
> - Before profiling the user space program wakes up all CPUs for once, so
> can avoid to missing account time for CPU staying in idle state for
> long time; the program forces to set 'scaling_max_freq' to lowest
> frequency and then restore 'scaling_max_freq' to highest frequency,
> this can ensure the frequency to be set to lowest frequency and later
> after start to run workload the frequency can be easily to be changed
> to higher frequency;
>
> - User space program reads map data and update statistics for every 5s,
> so this is same with other sample bpf programs for avoiding big
> overload introduced by bpf program self;
>
> - When send signal to terminate program, the signal handler wakes up
> all CPUs, set lowest frequency and restore highest frequency to
> 'scaling_max_freq'; this is exactly same with the first step so
> avoid to missing account CPU pstate and cstate time during last
> stage. Finally it reports the latest statistics.
>
> The program has been tested on Hikey board with octa CA53 CPUs, below
> is one example for statistics result, the format mainly follows up
> Jesper Dangaard Brouer suggestion.
>
> Jesper reminds to 'get printf to pretty print with thousands separators
> use %' and setlocale(LC_NUMERIC, "en_US")', tried three different arm64
> GCC toolchains (5.4.0 20160609, 6.2.1 20161016, 6.3.0 20170516) but all
> of them cannot support printf flag character %' on arm64 platform, so go
> back print number without grouping mode.
>
> CPU states statistics:
> state(ms) cstate-0 cstate-1 cstate-2 pstate-0 pstate-1 pstate-2 pstate-3 pstate-4
> CPU-0 767 6111 111863 561 31 756 853 190
> CPU-1 241 10606 107956 484 125 646 990 85
> CPU-2 413 19721 98735 636 84 696 757 89
> CPU-3 84 11711 79989 17516 909 4811 5773 341
> CPU-4 152 19610 98229 444 53 649 708 1283
> CPU-5 185 8781 108697 666 91 671 677 1365
> CPU-6 157 21964 95825 581 67 566 684 1284
> CPU-7 125 15238 102704 398 20 665 786 1197
>
> Cc: Daniel Lezcano <daniel.lezcano@xxxxxxxxxx>
> Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> Signed-off-by: Leo Yan <leo.yan@xxxxxxxxxx>

Applied to bpf-next, thanks Leo!