Re: [PATCH v2 5/7] perf/amd/ibs: Enable RIP bit63 hardware filtering
From: Ian Rogers
Date: Mon Mar 09 2026 - 12:04:20 EST
On Sun, Mar 8, 2026 at 7:58 PM Ravi Bangoria <ravi.bangoria@xxxxxxx> wrote:
>
> Hi Ian,
>
> >> Does the bit 63 assumption hold for guest operating systems?
> >
> > Yes, this seems to be an issue, even with current swfilt approach. Let
> > me inspect the code and get back.
>
> All mainstream 64 bit OSes use the bit-63 set for kernel addresses and zero
> for userspace addresses. This norm does not apply to 32 bit guests, but
> those are rare, and profiling them with IBS would be even rarer. So, I'll
> document this limitation in the perf-amd-ibs man page.
>
> While looking at this, I found some issues in IBS. Below patch fixes it:
>
> ---
>
> From deb6cdcbc60778b57a6eef60b2b7bd1b8e3cea74 Mon Sep 17 00:00:00 2001
> From: Ravi Bangoria <ravi.bangoria@xxxxxxx>
> Date: Fri, 6 Mar 2026 04:52:00 +0000
> Subject: [PATCH] perf/amd/ibs: Improve guest profiling
>
> IBS captures the RIP but not its privilege level. Since the NMI is
> delivered with delay, CPL can change between the IBS tag and NMI
> delivery. Add a check to catch and discard invalid guest samples
> using CPL stored in vCPU save area. This will work when there is
> user/kernel CPL change in between IBS tag and NMI delivery within
> the guest boundary. But it won't work when there is a guest entry
> or exit in between IBS tag and NMI delivery.
>
> When profiling a guest and the IBS RIP is valid, assign the sample
> IP from the IBS-captured RIP and set PERF_SAMPLE_IP in sample_flags
> so that perf_prepare_sample() do not overwrite the RIP with
> perf_guest_get_ip() from the vCPU save area. This keeps the perf
> sample IP consistent with IBS raw data, data_src, weight, phy_addr
> etc. The privilege level in the perf "misc" field can now go out
> of sync, as it is taken from the vCPU save area.
>
> Reported-by: Ian Rogers <irogers@xxxxxxxxxx>
> Closes: https://lore.kernel.org/r/CAP-5=fV_cJskvLRZhQQXMGAcPUb_Rg_b30PDJNXzxL49JK4B5g@xxxxxxxxxxxxxx
> Signed-off-by: Ravi Bangoria <ravi.bangoria@xxxxxxx>
Thanks Ravi!
Reviewed-by: Ian Rogers <irogers@xxxxxxxxxx>
Thanks,
Ian
> ---
> arch/x86/events/amd/ibs.c | 37 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 37 insertions(+)
>
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index eeb607b84dda..70408b0b1597 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -1415,6 +1415,7 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
> unsigned int msr;
> u64 *buf, *config, period, new_config = 0;
> int br_target_idx = -1;
> + unsigned int guest_state;
>
> if (!test_bit(IBS_STARTED, pcpu->state)) {
> fail:
> @@ -1526,6 +1527,42 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
> regs.flags |= PERF_EFLAGS_EXACT;
> }
>
> + guest_state = perf_guest_state();
> + if (!event->attr.exclude_guest && guest_state & PERF_GUEST_ACTIVE) {
> + /*
> + * IBS captures the RIP but not its privilege level. Since
> + * NMI arrives delayed, CPL might change in between IBS tag
> + * and the NMI delivery. Below checks can identify and filter
> + * out invalid samples when the CPL changes are within the
> + * guest boundary. However, these checks fail to handle cases
> + * where the CPU performs a guest entry or exit in between
> + * the IBS tag and the NMI delivery.
> + */
> + if (event->attr.exclude_kernel && !(guest_state & PERF_GUEST_USER)) {
> + throttle = perf_event_account_interrupt(event);
> + goto out;
> + }
> + if (event->attr.exclude_user && guest_state & PERF_GUEST_USER) {
> + throttle = perf_event_account_interrupt(event);
> + goto out;
> + }
> +
> + /*
> + * Assign the IBS RIP value directly in the perf sample here
> + * to prevent perf_prepare_sample() from retrieving it from
> + * the vCPU save-area. With this, rest of the perf sample
> + * fields (raw data, data_src, weight, phy_addr, etc.) will
> + * remain in sync with sample IP. However, privilege level
> + * captured as part of perf sample "misc" field could now
> + * go out of sync since privilege level is fetched from the
> + * vCPU save area.
> + */
> + if (regs.flags & PERF_EFLAGS_EXACT) {
> + data.ip = regs.ip;
> + data.sample_flags |= PERF_SAMPLE_IP;
> + }
> + }
> +
> if (((ibs_caps & IBS_CAPS_BIT63_FILTER) ||
> (event->attr.config2 & IBS_SW_FILTER_MASK)) &&
> perf_ibs_discard_sample(perf_ibs, event, ®s, &ibs_data, br_target_idx)) {
> --
> 2.43.0
>
>