Re: [PATCH 02/16] perf: Unified API to record selective sets of arch registers

From: Stephane Eranian
Date: Wed May 02 2012 - 08:00:27 EST


Sorry for the delay, had higher priority tasks to do.
[+asharma]

On Thu, Apr 26, 2012 at 5:28 PM, Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> On Mon, Apr 23, 2012 at 12:33:50PM +0200, Jiri Olsa wrote:
>> On Mon, Apr 23, 2012 at 12:10:57PM +0200, Stephane Eranian wrote:
>> > On Tue, Apr 17, 2012 at 1:17 PM, Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
>
> SNIP
>
>> > How are you going to deal with 32-bit binaries sampled on a 64-bit system?
>>
>> I dont have the solution right now... but seems like compat tasks need more
>> thinking even before go ahead with this patchset.. since it's going affect
>> the perf_event_attr and could bite us in future.
> hi,
> got more info on the compat task unwind
>
> - for 32 bit task running under 64 bit env. the 64 bits user
> Âregisters values are stored on kernel stack when entering
> Âthe kernel via exception or interrupt, like for native
> Â64 bit task
>
You mean the 32-bit registers are stored on the kernel stack,
right? Or you mean 64-bit and the upper 32 are guaranteed 0.


> ÂSo I think we can keep the current interface as far as
> Âcompat tasks are concerned, since we will get 64 bits
> Âregisters all the time anyway.
>
> ÂThe place that will take care of compat task unwind
> Âis the post processing unwind.
>
> ÂFor each processed sample we:
> Â Â - get the sample and translate IP into MAP and DSO
> Â Â - read DSO ELF class and figure out wether we deal with
> Â Â Â 64 or 32 bit task
> Â Â - run libunwind interface with proper task class info,
> Â Â Â which gets us to next bullet:
>
> - 64 bit libunwind does not support unwind of 32 bit tasks ;)
> Âso unless that change, I can see just one hacky way of doing
> Âthis via 32 bit libunwind being loaded in separate 32 bit
> Âprocess and doing remote unwind for us..

okay was not aware of that restriction on libunwind. I copied Arun
on this response, so maybe he can comment on that.

>
> ÂI'll try to follow on this to see if there'd be some better
> Âlibunwind interface solution.. but thats quite longterm ;)
>
>
> As for the sample registers interface.
>
> Currently we have:
>
> Âu64 user_sample_regs
> Â- if != 0 we provide the user registers with mask specified
> Â Âby its value
>
> Â- it will stay for compat tasks as well

What if I say EAX|EBX|R15? but the sample was captured
on a 32-bit tasks. Are you going to just store 0 for R15?
Unless you also store a bitmask of what was actually saved,
then you have to fill in non-existent registers with zeroes, otherwise
the tool cannot parse the sample.


> Â- we could use PERF_SAMPLE_USER_REGS sample type instead of the != 0
> Â Âcheck to be more consistent, but that would eat up one sample bit
> Â Âunnecessary

But then that would be aligned with how branch_stack has been implemented
for instance (PERF_SAMPLE_BRANCH_STACK).

>
> In some previous email you suggested some generic interface like
>
> Â Âattr->sample_type |= PERF_SAMPLE_REGS
> Â Âattr->sample_regs = EAX | EBX | EDI | ESI |.....
> Â Âattr->sample_reg_mode = { INTR, PRECISE, USER }
>
> I think we can have something like:
>
> Â Âattr->sample_type |= PERF_SAMPLE_REGS
> Â Âattr->sample_reg_mode = { INTR, PRECISE, USER }
>
> but in case we want eg both USER and INTR modes together then we still
> need to have:
>
> Âu64 user_sample_regs
> Âu64 intr_sample_regs
> Â...
>
Yes. but if we allow any combinations, then you'd need
u64 user_sample_regs
u64 intr_sample_regs
u64 precise_sample_regs

Note that in the case of Intel PEBS used for precise mode, there are
only a subset of the INTR registers available.

> for the register modes mask definition. Some mode combinations might be
> useless, but I think this could work.. we could always customize our
> needs with new mode ;)
>
The INTR vs. PRECISE is useful to get an idea of the skid.
The USER vs. INTR is useful to determine how we entered
the kernel in case the IP @ INTR is in the kernel.

> I'll start to work on this unless I hear some screaming ;)
>

In any case, the important issue is how does the kernel
satisfy the request for registers when those may not
be available in the interrupt task AND it is impossible
to know this in advance.

Note that in the case of precise on Intel, we know in advance
which registers will be available. So you can fail early, when
the event is created.

The alternative is to include the bitmask of which registers
was actually saved at the beginning of the section after the
ABI type flag.


> thoughts? ;)
>
>
> thanks and sorry for long email,
> jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/