Re: [PATCH] perf: Fix probing for PERF_FLAG_FD_CLOEXEC flag

From: Adrian Hunter
Date: Tue Feb 24 2015 - 06:33:24 EST


On 19/02/15 19:28, Adrian Hunter wrote:
> On 19/02/2015 6:22 p.m., David Ahern wrote:
>> On 2/19/15 9:17 AM, Adrian Hunter wrote:
>>> Yes, I am sorry it is a pain. I don't know why I didn't add a comment
>>> to the code :-(. Using -1 for the pid is a workaround to avoid gratuitous
>>> jump label changes. If pid=0 is used and then a system-wide trace is done
>>> with Intel PT, there will be a jump label change shortly after the tracing
>>> starts. That means the running code gets changed, but Intel PT decoding
>>> has to walk the code to reconstruct the trace - so errors result. There
>>> will always be occasional jump label changes, but this avoids one that
>>> would otherwise always happen.
>>
>> I don't understand the response. Why can't pid == getpid() (ie., pid > 0)
>
> IIRC pid == getpid() is the same as pid = 0
>
>> be used for this test? pid = -1 and pid = 0 are not needed. With pid > 0
>> cpu value does not matter so cpu = -1 can be used. Again this is just to
>> determine if the kernel supports PERF_FLAG_FD_CLOEXEC. Existence of PT
>> should not be involved here.
>
> This is about the side-effects of opening perf events. One of the side-effects
> is that some jump labels get switched. For optimization reasons, there is then
> a delay before they switch back. That means that a side-effect of probing the
> API is that jump label changes, that otherwise would not have happened, appear
> during the trace.
>
> This is not only about Intel PT. From an abstract point of view, it is
> about minimizing the disturbance to the system under test.
>
>
>

How about this:

From: Adrian Hunter <adrian.hunter@xxxxxxxxx>
Date: Tue, 24 Feb 2015 13:20:59 +0200
Subject: [PATCH] perf tools: Fix probing for PERF_FLAG_FD_CLOEXEC flag

Commit f6edb53c4993ffe92ce521fb449d1c146cea6ec2 converted the probe to
a CPU wide event first (pid == -1). For kernels that do not support
the PERF_FLAG_FD_CLOEXEC flag the probe fails with EINVAL. Since this
errno is not handled pid is not reset to 0 and the subsequent use of
pid = -1 as an argument brings in an additional failure path if
perf_event_paranoid > 0:

$ perf record -- sleep 1
perf_event_open(..., 0) failed unexpectedly with error 13 (Permission denied)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.007 MB /tmp/perf.data (11 samples) ]

Since this function only needs to get past this check in kernel/events/core.c:

/* for future expandability... */
if (flags & ~PERF_FLAG_ALL)
return -EINVAL;

Also, ensure the fd of the confirmation check is closed and comment
why pid = -1 is used.

Needs to go to 3.18 stable tree as well.

Based-on-patch-by: David Ahern <david.ahern@xxxxxxxxxx>
Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx>
---
tools/perf/util/cloexec.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/cloexec.c b/tools/perf/util/cloexec.c
index 47b78b3..6da965b 100644
--- a/tools/perf/util/cloexec.c
+++ b/tools/perf/util/cloexec.c
@@ -25,6 +25,10 @@ static int perf_flag_probe(void)
if (cpu < 0)
cpu = 0;

+ /*
+ * Using -1 for the pid is a workaround to avoid gratuitous jump label
+ * changes.
+ */
while (1) {
/* check cloexec flag */
fd = sys_perf_event_open(&attr, pid, cpu, -1,
@@ -47,16 +51,24 @@ static int perf_flag_probe(void)
err, strerror_r(err, sbuf, sizeof(sbuf)));

/* not supported, confirm error related to PERF_FLAG_FD_CLOEXEC */
- fd = sys_perf_event_open(&attr, pid, cpu, -1, 0);
+ while (1) {
+ fd = sys_perf_event_open(&attr, pid, cpu, -1, 0);
+ if (fd < 0 && pid == -1 && errno == EACCES) {
+ pid = 0;
+ continue;
+ }
+ break;
+ }
err = errno;

+ if (fd >= 0)
+ close(fd);
+
if (WARN_ONCE(fd < 0 && err != EBUSY,
"perf_event_open(..., 0) failed unexpectedly with error %d (%s)\n",
err, strerror_r(err, sbuf, sizeof(sbuf))))
return -1;

- close(fd);
-
return 0;
}

--
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/