Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS

From: Adrian Hunter
Date: Fri Jul 03 2015 - 05:11:27 EST


On 02/07/15 22:00, Ingo Molnar wrote:
>
> * Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
>
>>> So I really think we need an extended error reporting feature on the perf
>>> kernel side, so that a 'natural' error (plus a string) is reported back to
>>> tooling, instead of the current -EINVAL.
>>>
>>> No need to do it for everything, doing it for BTS and related functionality
>>> would be a good first step to start this.
>>>
>>> If you are interested you could try this, or I can try to write something
>>> (after the merge window).
>>
>> Don't have time sorry.
>
> So who on the Intel side has time to finish Intel/PT support properly?
>
> As a first step I think we need to disable the Intel/PT kernel side - it was
> merged with the express promise that proper tooling would come along promptly,
> but if that's not happening due to lack of resources, then the kernel side is
> obviously in limbo as well.

I am afraid I took your "If you are interested you could try this" literally.

So you are saying you will apply the patches if I develop the extended error
string feature for the perf_event_open syscall?

>
>>> Does this make sense to you?
>>
>> Normally the warts of syscalls are hidden by libraries. I am not sure a library
>> couldn't do just as well with the advantage that it would work for old kernels
>> too. A library could also have information about multiple architectures, not
>> just the one that is actually running e.g. your BTS example won't work on ARM.
>
> The method I suggested works _both_ with old and new kernels, with the new kernel
> simply giving better error feedback.
>
> And these are not warts really, it's simply a threshold issue: above a certain
> complexity of features it makes sense to introduce a qualitatively better error
> reporting interface. When I tested Intel PT support I happened to go past that
> threshold and it became obvious that we need it.
>
>> A library could provide a similar API to the one you described. Either it just
>> does the syscall and returns, or if extended error information is requested, it
>> probes the API with various options, checks perf_event_paranoid, and generally
>> uses whatever information it can to best figure out what went wrong.
>
> So I think eventually a proper libperf will crystalize out of perf's syscall
> wrappers (I suggested it for a long time), but this has very little impact on the
> end user who doesn't care whether it's done in perf, or in some intermediate
> library.
>
> So it can be done in the current perf code just as much - when libperf gets
> introduced it will be factored out into tools/lib/perf/ without much problem and
> then other tools can use it too.
>
>>> I.e. if the BTS driver was not created, we'd still have
>>> /sys/bus/event_source/devices/intel_bts/error (and no other file), which gives
>>> tooling a good way to discover why a particular PMU is not available. This
>>> adds a tiny bit more RAM overhead, but it's for the better I think, because
>>> tooling could be a lot more certain about what the capabilities of the kernel
>>> are.
>>
>> That presumes the user understands what is or is not available on their
>> architecture, because on a non-x86 architecture they still get nothing
>> x86-specific.
>>
>> It also runs a bit against the way config works e.g. even if the PMU driver is
>> config'ed out it still has to provide a sysfs interface.
>
> I partially agree - but it's really a problem that we currently only get two
> states:
>
> - the PMU driver is there in sysfs
> - the PMU driver is not there in sysfs
>
> while we really want to a lot more about why it's not there - and it's the kernel
> that knows this best, not tooling.
>
> So maybe instead of having a separate directory, we could have a sysfs file that
> listed all the PMU drivers that the kernel knows about, with a status line that
> tells us whether they are:
>
> - not configured
> - configured but runtime disabled due to reason X or Y or Z
> - configured and enabled
>
> OTOH having the directories with a single file in them is a close substitute, and
> better meets the sysfs 'one file, one value' principle.

One file, one value can be done.

What about a directory:

/sys/bus/event_source/known_pmus/

that contains a subdirectory for each pmu e.g.

/sys/bus/event_source/known_pmus/intel_pt/

which contains at least 1 file:

/sys/bus/event_source/known_pmus/intel_pt/status

That makes it easy to add more information by adding new files.

So the core would create and manage the sysfs directories and files. The
default status would be "Not supported on this architecture". Arch and PMU
drivers would just need to tell the core what their status is.

>
>> So, isn't the patch I proposed sufficient for now?
>
> Well, no, because it really does the check in the wrong place and introduces
> possible friction between what tooling thinks is the supported environment and
> what the kernel thinks - and thus postpones the problem.
>
> I could live with this current solution as the initial version if a subsequent
> effort fixes it up properly by adding much better error reporting infrastructure,
> but the 'no time' comment makes me doubt the whole Intel/PT feature set.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/