Re: [PATCH v2 4/5] perf kvm: Support sampling guest callchains

From: Tianyi Liu
Date: Wed Oct 11 2023 - 10:46:24 EST


Hi Maxim,

At 2023-10-10 16:12 +0000, Maxim Levitsky wrote:
> > +static inline void
> > +perf_callchain_guest32(struct perf_callchain_entry_ctx *entry)
> > +{
> > + struct stack_frame_ia32 frame;
> > + const struct stack_frame_ia32 *fp;
> > +
> > + fp = (void *)perf_guest_get_frame_pointer();
> > + while (fp && entry->nr < entry->max_stack) {
> > + if (!perf_guest_read_virt(&fp->next_frame, &frame.next_frame,
> This should be fp->next_frame.
> > + sizeof(frame.next_frame)))
> > + break;
> > + if (!perf_guest_read_virt(&fp->return_address, &frame.return_address,
> Same here.
> > + sizeof(frame.return_address)))
> > + break;
> > + perf_callchain_store(entry, frame.return_address);
> > + fp = (void *)frame.next_frame;
> > + }
> > +}
> > +

The address space where `fp` resides here is in the guest memory, not in
the directly accessible kernel address space. `&fp->next_frame` and
`&fp->return_address` are simply calculating address offsets in a more
readable manner, much like `fp + 0` and `fp + 4`.

The original implementation of `perf_callchain_user` and
`perf_callchain_user32` also use this approach [1].

>
> For symmetry, maybe it makes sense to have perf_callchain_guest32 and perf_callchain_guest64
> and then make perf_callchain_guest call each? No strong opinion on this of course.
>

The `perf_callchain_guest` and `perf_callchain_guest32` here are simply
designed to mimic `perf_callchain_user` and `perf_callchain_user32` [2].
I'm also open to make the logic fully separate, if this doesn't seem
elegant enough.

[1] https://github.com/torvalds/linux/blob/master/arch/x86/events/core.c#L2890
[2] https://github.com/torvalds/linux/blob/master/arch/x86/events/core.c#L2820


Best regards,
Tianyi Liu