Re: [PATCH 2/8] perf metrics: fix parse errors in cascade lake metrics

From: Jin, Yao
Date: Thu Apr 23 2020 - 03:51:31 EST


Hi Ian,

On 4/23/2020 2:09 PM, Ian Rogers wrote:
On Wed, Apr 22, 2020 at 10:54 PM Jin, Yao <yao.jin@xxxxxxxxxxxxxxx> wrote:

Hi Jiri,

Bisected to this commit which introduced the regression.

26226a97724d ("perf expr: Move expr lexer to flex")

Would you like to look at that?

Hi Jin,

that commit breaks parsing of things like ','. See fixes in this patch
set such as:
https://lore.kernel.org/lkml/20200422220430.254014-5-irogers@xxxxxxxxxx/
Fixing the lex issues then exposes other bugs that need to be
corrected in the json. I've added Fixes to the commit message of:
https://lore.kernel.org/lkml/20200422220430.254014-3-irogers@xxxxxxxxxx/
https://lore.kernel.org/lkml/20200422220430.254014-4-irogers@xxxxxxxxxx/
and would be glad of a review. If we can land:
https://lore.kernel.org/lkml/20200422220430.254014-12-irogers@xxxxxxxxxx/
then expr as the source of parse errors can go away :-) The next
problem is the parse events code, but some of that logic is dependent
on the machine it is running on. It'd be good to add a test that
parsed events code can handle the events in metrics too, filtering out
things like duration_time that are special to metrics.

Thanks,
Ian


Only with the fix "https://lore.kernel.org/lkml/20200422220430.254014-5-irogers@xxxxxxxxxx/"; (without other json modifications), the issue was still there.

localhost:~ # perf stat -M DRAM_Read_Latency
event syntax error: '../event=0x36,,umask=0x21/,cha/event=0x35,cha_0/event=0x0/}:W,duration_time'
\___ parser error

Usage: perf stat [<options>] [<command>]

-M, --metrics <metric/metric group list>
monitor specified metrics or metric groups (separated by ,)

So you added other commits which changed the json to let the parse work. But I don't know if we have to do with this way because it should be a regression issue.

In my opinion, we'd better fix the issue in 26226a97724d ("perf expr: Move expr lexer to flex") and try not to change the json if possible.

Thanks
Jin Yao

Thanks
Jin Yao

On 4/23/2020 9:08 AM, Jin, Yao wrote:


On 4/23/2020 12:18 AM, Ian Rogers wrote:
On Wed, Apr 22, 2020 at 8:34 AM Ian Rogers <irogers@xxxxxxxxxx> wrote:

On Wed, Apr 22, 2020 at 7:38 AM Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:

On Wed, Apr 22, 2020 at 12:48:03AM -0700, Ian Rogers wrote:
Remove over escaping with \\.
Remove extraneous if 1 if 0 == 1 else 0 else 0.

So where do these parse errors happen exactly? Some earlier
patches introduced them as regressions?

I'll work to track down a Fixes tag. I can repro the Skylakex errors
without the test in this series, by doing:

$ perf stat -M DRAM_Read_Latency sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (cha/event=0x36\,uma
sk=0x21/).
/bin/dmesg | grep -i perf may provide additional information.


I also think some patches introduced this regression. When we rollback
to commit 61ec07f5917e (perf vendor events intel: Update all the Intel
JSON metrics from TMAM 3.6.), there is no this error on CLX.

Thanks
Jin Yao

This was just the escaping issue. I'm less clear on the other cascade
lake issue, and it is a bit more work for me to test on cascade lake.
What is "if 1 if 0 == 1 else 0 else 0" trying to do? Perhaps hunting
for the Fixes will let me know, but it looks like a copy-paste error.

The original metrics worked without parse errors as far as I know.

The skylake issue above repros on 5.2.17 and so it seems like it is
broken for a while. The test in this series will prevent this in the
future, but without this patch that test fails.

The parse errors were introduced with the metrics, so they've never
worked:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=fd5500989c8f3c3944ac0a144be04bae2506f7ba


I will send out a v2 with Fixes in the commit message but wanted to
wait in case there was any more feedback. In particular the fixes to
the new test and expr parser lex code. The lex code wasn't broken at
the time the metrics were added and should be working again after this
patch set.

Thanks,
Ian

If it fixes something earlier it would need Fixes: tags.

Working on it. Thanks for the input!

Ian

-Andi