Re: [RFC PATCH] perf/core: don't sample kernel regs upon skid

From: Boris Ostrovsky
Date: Thu Jul 12 2018 - 12:32:42 EST


On 07/12/2018 07:56 AM, Mark Rutland wrote:
> On Mon, Jul 09, 2018 at 06:42:29PM -0400, Boris Ostrovsky wrote:
>> On 07/02/2018 12:02 PM, Mark Rutland wrote:
>>> On Mon, Jul 02, 2018 at 05:46:55PM +0200, Peter Zijlstra wrote:
>>>> On Mon, Jul 02, 2018 at 04:12:50PM +0100, Mark Rutland wrote:
>>>>> +static struct pt_regs *perf_get_sample_regs(struct perf_event *event,
>>>>> + struct pt_regs *regs)
>>>>> +{
>>>>> + /*
>>>>> + * Due to interrupt latency (AKA "skid"), we may enter the kernel
>>>>> + * before taking an overflow, even if the PMU is only counting user
>>>>> + * events.
>>>>> + *
>>>>> + * If we're not counting kernel events, always use the user regs when
>>>>> + * sampling.
>>>>> + *
>>>>> + * TODO: what do we do about sampling a guest's registers? The IP is
>>>>> + * special-cased, but for the rest of the regs they'll get the
>>>>> + * user/kernel regs depending on whether exclude_kernel is set, which
>>>>> + * is nonsensical.
>>>>> + *
>>>>> + * We can't get at the full set of regs in all cases (e.g. Xen's PV PMU
>>>>> + * can't provide the GPRs), so should we just zero the GPRs when in a
>>>>> + * guest? Or skip outputting the regs in perf_output_sample?
>>>> Seems daft Xen cannot provide registers; why is that? Boris?
>>> The xen_pmu_regs structure simply doesn't have them, so I assume there's
>>> no API to get them.
>>>
>>> Given we don't currently sample the guest regs, I'd be tempted to just
>>> zero them for now, or skip the sample at output time (if that doesn't
>>> break some other case).
>> (Was out on vacation, couldn't respond earlier)
>>
>> Yes, PV guests only get a limited set of registers passed to the handler
>> by the hypervisor. GPRs are not part of this set.
>>
>> I think we need do
>>
>> diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
>> index 7d00d4a..95997e6 100644
>> --- a/arch/x86/xen/pmu.c
>> +++ b/arch/x86/xen/pmu.c
>> @@ -478,7 +478,7 @@ static void xen_convert_regs(const struct
>> xen_pmu_regs *xen_regs,
>> Âirqreturn_t xen_pmu_irq_handler(int irq, void *dev_id)
>> Â{
>> ÂÂÂÂÂÂÂ int err, ret = IRQ_NONE;
>> -ÂÂÂÂÂÂ struct pt_regs regs;
>> +ÂÂÂÂÂÂ struct pt_regs regs = {0};
>> ÂÂÂÂÂÂÂ const struct xen_pmu_data *xenpmu_data = get_xenpmu_data();
>> ÂÂÂÂÂÂÂ uint8_t xenpmu_flags = get_xenpmu_flags();
>>
>>
>> Do you want me to submit a separate patch or can you make this part of
>> yours?
> I've only just realised that this is an issue today, without my
> synthezied pt_regs changes. For any PERF_SAMPLE_REGS_* event under Xen,
> we'll leak uninitialised kernel stack to userspace in the GPR fields.
>
> Boris, I think it's worth spinning a patch to address that now, with Cc
> stable, if you're still happy to do so?


Sure, I can do this.

(As a side note, I also noticed this issue but wasn't especially
concerned about it since VPMU is not supported in Xen from security
perspective: http://xenbits.xenproject.org/xsa/advisory-163.html).

-boris