Re: [BUG] perf: No samples found when using kcore + coresight

From: Yang Shi
Date: Wed Mar 08 2023 - 14:56:57 EST


On Tue, Mar 7, 2023 at 11:45 AM Yang Shi <shy828301@xxxxxxxxx> wrote:
>
> Hi James,
>
> Thanks for the prompt response. Please see the inline replies.
>
> On Tue, Mar 7, 2023 at 2:32 AM James Clark <james.clark@xxxxxxx> wrote:
> >
> >
> >
> > On 06/03/2023 21:13, Yang Shi wrote:
> > > Hi,
> > >
> > > I'm seeking some help regarding perf record --kcore + coresight. When
> > > I tracing with perf record --kcore -e cs_etm ... , perf report shows
> > > "The perf.data/data data has no samples!".
> > >
> > > I tried other combinations:
> > > 1. perf record --kcore (w/o using coresight): works
> > > 2. perf record -e cs_etm ... : works
> > > 3. Manually copy kcore with perf buildid-cache, then run perf record
> > > -e cs_etm... : works
> > >
> > > So just "perf record --kcore -e cs_etm ..." doesn't work.
> > >
> > > I'm using v6.2 kernel and the perf is built from the same kernel
> > > source with OpenCSD 1.4. Also attached the sample file generated by
> > > perf.
> > >
> > > Any suggestion is appreciated.
> > >
> > > Thanks,
> > > Yang
> >
> > Hi Yang,
> >
> > I don't see any issue with this command and I still get samples:
> >
> > perf record -e cs_etm// --kcore -- true
>
> This command works for me. But the below command doesn't work:
>
> perf record --kcore -e cs_etm/@tmc_etf63/k --per-thread -- taskset
> --cpu-list 1 uname
>
> >
> > You might want to try running both record and report in verbose and
> > stdio mode (-vvv --stdio) to see if you see any warnings.
> >
> > I can't think of any way --kcore would cause an issue because all it
> > does is save kcore into the .debug cache rather than affecting the
> > actual perf.data file.
>
> Yes, this is what I thought. But digging into the code seems it may
> add a new dummy event. Please see the below data header dump.
>
> >
> > Are you running perf report in the same place the recording was made? Or
> > on another system?
>
> Yes.
>
> I did some further investigation. The below is the dump for both good
> and bad data files.
>
> Good (copy kcore manually):
> # captured on : Tue Mar 7 11:19:14 2023
> # header version : 1
> # data offset : 1576
> # data size : 26736
> # feat offset : 28312
> # hostname : fedora
> # os release : 6.2.0-coresight+
> # perf version : 6.2.g3424af22c4fc
> # arch : aarch64
> # nrcpus online : 128
> # nrcpus avail : 128
> # cpuid : 0x00000000c00fac30
> # total memory : 2108862504 kB
> # cmdline : /root/bin/perf record -e cs_etm/@tmc_etf63/k --kcore
> --per-thread -- taskset --cpu-list 1 true
> # event : name = cs_etm/@tmc_etf63/k, , id = { 2248 }, type = 9, size
> = 128, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID|LO
> ST, disabled = 1, exclude_user = 1, exclude_hv = 1, enable_on_exec =
> 1, sample_id_all = 1, { bp_len, config2 } = 0x12792918
> # event : name = dummy:u, , id = { 2249 }, type = 1, size = 128,
> config = 0x9, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID|
> LOST, disabled = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm
> = 1, enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest =
> 1, mmap2 = 1, comm_exec
> = 1, context_switch = 1, bpf_event = 1
> # event : name = dummy:u, , id = { 2250, 2251, 2252, 2253, 2254, 2255,
> 2256, 2257, 2258, 2259, 2260, 2261, 2262, 2263, 2264, 2265, 2266,
> 2267, 2268, 2269, 2270, 2271
> , 2272, 2273, 2274, 2275, 2276, 2277, 2278, 2279, 2280, 2281, 2282,
> 2283, 2284, 2285, 2286, 2287, 2288, 2289, 2290, 2291, 2292, 2293,
> 2294, 2295, 2296, 2297, 2298, 2
> 299, 2300, 2301, 2302, 2303, 2304, 2305, 2306, 2307, 2308, 2309, 2310,
> 2311, 2312, 2313, 2314, 2315, 2316, 2317, 2318, 2319, 2320, 2321,
> 2322, 2323, 2324, 2325, 2326
> , 2327, 2328, 2329, 2330, 2331, 2332, 2333, 2334, 2335, 2336, 2337,
> 2338, 2339, 2340, 2341, 2342, 2343, 2344, 2345, 2346, 2347, 2348,
> 2349, 2350, 2351, 2352, 2353, 2
> 354, 2355, 2356, 2357, 2358, 2359, 2360, 2361, 2362, 2363, 2364, 2365,
> 2366, 2367, 2368, 2369, 2370, 2371, 2372, 2373, 2374, 2375, 2376, 2377
> }, type = 1, size = 128
> , config = 0x9, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|TIME|IDENTIFIER, read_format = ID|LOST, exclude_kernel = 1,
> exclude_hv = 1, sample_id_all =
> 1, exclude_guest = 1, ksymbol = 1, text_poke = 1
> # CPU_TOPOLOGY info available, use -I to display
> # NUMA_TOPOLOGY info available, use -I to display
> # pmu mappings: armv8_pmuv3_0 = 8, software = 1, arm_cmn_0 = 10,
> uprobe = 7, cs_etm = 9, breakpoint = 5, tracepoint = 2, arm_cmn_1 =
> 11, kprobe = 6
> # contains AUX area data (e.g. instruction trace)
> # CACHE info available, use -I to display
> # time of first sample : 18446744073.709551
> # time of last sample : 18446744073.709551
> # sample duration : 0.000 ms
> # MEM_TOPOLOGY info available, use -I to display
> # armv8_pmuv3_0 pmu capabilities: slots=0x00000004,
> bus_slots=0x00000000, bus_width=0x00000000
> # missing features: (null) CPUDESC BRANCH_STACK GROUP_DESC STAT
> CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
>
> Bad:
> # captured on : Tue Mar 7 11:19:14 2023
> # header version : 1
> # data offset : 1576
> # data size : 26736
> # feat offset : 28312
> # hostname : fedora
> # os release : 6.2.0-coresight+
> # perf version : 6.2.g3424af22c4fc
> # arch : aarch64
> # nrcpus online : 128
> # nrcpus avail : 128
> # cpuid : 0x00000000c00fac30
> # total memory : 2108862504 kB
> # cmdline : /root/bin/perf record -e cs_etm/@tmc_etf63/k --kcore
> --per-thread -- taskset --cpu-list 1 true
> # event : name = cs_etm/@tmc_etf63/k, , id = { 2248 }, type = 9, size
> = 128, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID|LO
> ST, disabled = 1, exclude_user = 1, exclude_hv = 1, enable_on_exec =
> 1, sample_id_all = 1, { bp_len, config2 } = 0x12792918
> # event : name = dummy:u, , id = { 2249 }, type = 1, size = 128,
> config = 0x9, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID|
> LOST, disabled = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm
> = 1, enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest =
> 1, mmap2 = 1, comm_exec
> = 1, context_switch = 1, bpf_event = 1
> # event : name = dummy:u, , id = { 2250, 2251, 2252, 2253, 2254, 2255,
> 2256, 2257, 2258, 2259, 2260, 2261, 2262, 2263, 2264, 2265, 2266,
> 2267, 2268, 2269, 2270, 2271
> , 2272, 2273, 2274, 2275, 2276, 2277, 2278, 2279, 2280, 2281, 2282,
> 2283, 2284, 2285, 2286, 2287, 2288, 2289, 2290, 2291, 2292, 2293,
> 2294, 2295, 2296, 2297, 2298, 2
> 299, 2300, 2301, 2302, 2303, 2304, 2305, 2306, 2307, 2308, 2309, 2310,
> 2311, 2312, 2313, 2314, 2315, 2316, 2317, 2318, 2319, 2320, 2321,
> 2322, 2323, 2324, 2325, 2326
> , 2327, 2328, 2329, 2330, 2331, 2332, 2333, 2334, 2335, 2336, 2337,
> 2338, 2339, 2340, 2341, 2342, 2343, 2344, 2345, 2346, 2347, 2348,
> 2349, 2350, 2351, 2352, 2353, 2
> 354, 2355, 2356, 2357, 2358, 2359, 2360, 2361, 2362, 2363, 2364, 2365,
> 2366, 2367, 2368, 2369, 2370, 2371, 2372, 2373, 2374, 2375, 2376, 2377
> }, type = 1, size = 128
> , config = 0x9, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|TIME|IDENTIFIER, read_format = ID|LOST, exclude_kernel = 1,
> exclude_hv = 1, sample_id_all =
> 1, exclude_guest = 1, ksymbol = 1, text_poke = 1
> # CPU_TOPOLOGY info available, use -I to display
> # NUMA_TOPOLOGY info available, use -I to display
> # pmu mappings: armv8_pmuv3_0 = 8, software = 1, arm_cmn_0 = 10,
> uprobe = 7, cs_etm = 9, breakpoint = 5, tracepoint = 2, arm_cmn_1 =
> 11, kprobe = 6
> # contains AUX area data (e.g. instruction trace)
> # CACHE info available, use -I to display
> # time of first sample : 18446744073.709551
> # time of last sample : 18446744073.709551
> # sample duration : 0.000 ms
> # MEM_TOPOLOGY info available, use -I to display
> # armv8_pmuv3_0 pmu capabilities: slots=0x00000004,
> bus_slots=0x00000000, bus_width=0x00000000
> # missing features: (null) CPUDESC BRANCH_STACK GROUP_DESC STAT
> CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
>
>
> Dumping raw events could show the events from the bad data file. But
> it has zero samples after event collapse.
>
> The only difference is --kcore inserted a new text_poke dummy event.
> It seems coresight also inserted a dummy event with my command but
> your command didn't. So it seems like the two dummy events confused
> event collapse.
>
> The text_poke dummy event is added by commit
> f42c0ce573df79d1b8bd169008c994dcdd43585a ("perf record: Always get
> text_poke events with --kcore option"). If I reverted this commit,
> then it works. But I'm not sure whether this is the right fix or real
> root cause or not. Or coresight shouldn't insert its own dummy event?

It seems like coresight needs to insert the dummy event if
full_auxtrace is on IIUC. So it sounds like event collapse can't
handle such a case? I also tried v5.19 (before "perf record: Always
get text_poke events with --kcore option", which was merged in v6.0),
it works. So it seems like a regression.

>
> >
> > James