Re: perf tool: About tests debug level

From: Ian Rogers
Date: Tue Jun 22 2021 - 01:05:06 EST


On Mon, Jun 21, 2021 at 4:26 AM John Garry <john.garry@xxxxxxxxxx> wrote:
>
> Hi guys,
>
> I noticed that when it was discovered recently that the new icelake JSON
> did not pass "PMU events" test, running the test with -v makes pr_err()
> and pr_debug() come out at the same level, so it's hard to distinguish
> the important logs.
>
> Here is a link:
> https://lore.kernel.org/lkml/YLdq%2FH8CXYgHWzCL@xxxxxxxxxx/
>
> And here is an extract:
>
> parsing '(cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc) * msr@tsc@
> / 1000000000 / duration_time'
> parsing '( ( 1 * ( fp_arith_inst_retired.scalar_single +
> fp_arith_inst_retired.scalar_double ) + 2 *
> fp_arith_inst_retired.128b_packed_double + 4 * (
> fp_arith_inst_retired.128b_packed_single +
> fp_arith_inst_retired.256b_packed_double ) + 8 * (
> fp_arith_inst_retired.256b_packed_single +
> fp_arith_inst_retired.512b_packed_double ) + 16 *
> fp_arith_inst_retired.512b_packed_single ) / 1000000000 ) / duration_time'
> parsing 'cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc'
> parsing '1 - cpu_clk_unhalted.one_thread_active /
> cpu_clk_unhalted.ref_distributed'
> parsing 'cpu_clk_unhalted.thread:k / cpu_clk_unhalted.thread'
> parsing '( 64 * ( uncore_imc@cas_count_read@ +
> uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time'
> parsing '1000000000 * ( cha@event\=0x36\,umask\=0x21\,config\=0x40433@ /
> cha@event\=0x35\,umask\=0x21\,config\=0x40433@ ) / ( cha_0@event\=0x0@ /
> duration_time )'
> parsing 'cha@event\=0x36\,umask\=0x21\,config\=0x40433@ /
> cha@event\=0x36\,umask\=0x21\,config\=0x40433\,thresh\=1@'
> parsing '( 1000000000 * (
> cha@event\=0x36\,umask\=0x21\,config\=0x40433@_pmm /
> cha@event\=0x35\,umask\=0x21\,config\=0x40433@_pmm ) / cha_0@event\=0x0@
> )' [pr_debug]
> check_parse_fake failed [pr_err]
> test child finished with -1
> ---- end ----
>
> I annotated in [], above
>
> As for another issue, if you consider "Parse and process metrics", debug
> from core code comes out at same level as test code, e.g. with -v, we
> see pr_debug() from test code and core code. Again, this makes it hard
> to distinguish various levels. As an illustration, here I hack the code
> to fail a test:
>
> sudo ./perf test -v 68
> 68: Parse and process metrics :
> --- start ---
> test child forked, pid 9747
> metric expr inst_retired.any / cpu_clk_unhalted.thread for IPC
> found event inst_retired.any verbose=1 pr_debug
> found event inst_retired.any verbose=1 pr_err
> found event cpu_clk_unhalted.thread verbose=1 pr_debug
> found event cpu_clk_unhalted.thread verbose=1 pr_err
> adding {inst_retired.any,cpu_clk_unhalted.thread}:W
> FAILED tests/parse-metric.c:223 IPC failed, wrong ratio
> FAILED tests/parse-metric.c:374 IPC failed
> test child finished with -1
> ---- end ----
> Parse and process metrics: FAILED!
>
> Note that the "FAILED" messages from the test code come from pr_debug().
>
> In a way, I feel that pr_debug()/err from the test is more important
> than pr_debug() from the core code (when running a test).
>
> Any opinion on this or how to improve (if anyone agrees with me)? Or am
> I missing something? Or is it not so important?

Hi John,

I think the issue is that in the parsing you don't know it's broken
until something goes wrong. Putting everything on pr_err would cause
spam in the not broken case. Improving the parsing error handling is a
big task with lex and yacc to some extent getting in the way. Perhaps
a middle way is to have a parameter to the parser that logs more, and
recursively call this in the parser when parsing fails. I guess there
is also a danger of a performance hit.

Thanks,
Ian

> Thanks,
> John