+static bool skid_kernel_samples(struct perf_event *event, struct pt_regs *regs)The name is a bit opaque, especially where it is used in
__perf_event_overflow().
How about we invert the polarity and call this sample_is_allowed() ?
I guess the reason which Peter recommends to use a new cap is to have a way to keep original behavior.+{Do we need this new cap?
+ /*
+ * We may get kernel samples even though exclude_kernel
+ * is specified due to potential skid in sampling.
+ * The skid kernel samples could be dropped or just do
+ * nothing by testing the flag PERF_PMU_CAP_NO_SKID.
+ */
+ if (event->pmu->capabilities & PERF_PMU_CAP_NO_SKID)
+ return false;
I'd expect user_mode(regs) to be about as cheap as testing the cap, and
the common case is going to be that we we have test both.
For those PMUs without skid, when not sampling the kernel,
user_mode(regs) should always be true.
IMO, it would make more sense to just check user_mode(regs), which also
avoids any surprises with unexpected skid...
I just think only when the PERF_SAMPLE_IP is applied, we can get correct ip. So I check the PERF_SAMPLE_IP here.+How about:
+ if (event->attr.exclude_kernel &&
+ !user_mode(regs) &&
+ (event->attr.sample_type & PERF_SAMPLE_IP)) {
+ return true;
+ }
+
+ return false;
+}
static bool sample_is_allowed(struct perf_event *event, struct pt_regs *regs)
{
/*
* Due to interrupt latency (AKA "skid"), we may enter the
* kernel before taking an overflow, even if the PMU is only
* counting user events.
*
* To avoid leaking information to userspace, we must always
* reject kernel samples when exclude_kernel is set.
*/
if (!user_mode(regs) && event->attr.exclude_kernel &&
(event->attr.sample_type & PERF_SAMPLE_IP))
return false;
return true;
}
... do we need to reject any other sample types, or do we definitely
avoid leaks by other means?
OK, thanks! I will change the patch according to your comments.+.. with the above changes, this can be:
/*
* Generic event overflow handling, sampling.
*/
@@ -7337,6 +7357,12 @@ static int __perf_event_overflow(struct perf_event *event,
ret = __perf_event_account_interrupt(event, throttle);
/*
+ * For security, drop the skid kernel samples if necessary.
+ */
+ if (skid_kernel_samples(event, regs))
+ return ret;
+
if (!sample_is_allowed(event, regs))
return ret;
Thanks,
Mark.