Re: [RFC/PATCHSET 0/9] perf record: Implement BPF sample filter (v4)

From: Namhyung Kim
Date: Fri Mar 10 2023 - 16:56:42 EST


Hi Ravi,

On Fri, Mar 10, 2023 at 12:10:28PM +0530, Ravi Bangoria wrote:
> Hi Namhyung,
>
> Sorry, I should have tried earlier prototypes but missed it.

No worries and thanks for your review!

>
> > Maybe more useful example is when it deals with precise memory events.
> > On AMD processors with IBS, you can filter only memory load with L1
> > dTLB is missed like below.
> >
> > $ sudo ./perf record -ad -e ibs_op//p \
> > > --filter 'mem_op == load, mem_dtlb > l1_hit' sleep 1
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 1.338 MB perf.data (15 samples) ]
>
> On my zen4 machine:
>
> $ sudo ./perf record -d -e ibs_op//p --filter 'mem_op == load' -c 100000 ~/test
> [ perf record: Woken up 6 times to write data ]
> [ perf record: Captured and wrote 1.436 MB perf.data (30966 samples) ]
>
> $ sudo ./perf mem report -F sample,mem --stdio
> # Samples Memory access
> # ............ ........................
> 30325 L1 hit
> 477 Local RAM hit
> 89 L2 hit
> 75 L3 hit
>
> This looks good because IBS hw can't filter specific type of instruction
> and thus unfiltered data will contain "NA" types of memory accesses, which
> is absent here. So mem_op == load filter seems to be working.

Good!

>
> However, if I add "mem_lvl == l1" (or l2 / ram) in the filter, I see mostly
> all samples are getting lost:
>
> $ sudo ./perf record -d -e ibs_op//p --filter 'mem_op == load, mem_lvl == l1' -c 100000 ~/test
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.019 MB perf.data ]
>
> $ sudo ./perf report --stat | grep SAMPLE
> LOST_SAMPLES events: 1 ( 0.8%)
> LOST_SAMPLES events: 136332
>
> What am I missing?

It seems IBS PMU doesn't set the mem_lvlnum field in the data source.
As I said in the patch 7, 'mem_lvl' actually uses mem_lvlnum fields
instead of mem_lvl because it's preferred according to the comment in
the UAPI header.

/*
* PERF_MEM_LVL_* namespace being depricated to some extent in the
* favour of newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields.
* Supporting this namespace inorder to not break defined ABIs.
*
* memory hierarchy (memory level, hit or miss)
*/

I'll post a patch to set it separately.

>
> 2nd observation, invalid expressions like 'mem_op == load, mem_dtlb == l1'
> are not failing, instead recording misleading data:
>
> $ sudo ./perf record -d -e ibs_op//p --filter 'mem_op == load, mem_dtlb == l1' -c 100000 ~/test
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.047 MB perf.data (614 samples) ]
>
> $ sudo ./perf script -F data_src | grep "TLB N/A" | wc -l
> 614

Good point, that's the limitation in the current implementation.
I think it needs to keep the target sample field along with the
constant so that it can detect unintended uses. Let's me think
about it more.

Thanks,
Namhyung