Re: [PATCH 0/8] perf: add ability to sample physical data addresses
From: Peter Zijlstra
Date: Mon Jul 08 2013 - 04:19:52 EST
On Sat, Jul 06, 2013 at 12:48:48AM +0200, Stephane Eranian wrote:
> So, I tried on an example using shmat(). I map the same shared segment twice
> in the same process. Then I fork(): I see this in /proc/PID/maps:
>
> 7f80fce28000-7f80fce29000 rw-s 00000000 00:04 1376262
> /SYSV00000000 (deleted)
> 7f80fce29000-7f80fce2a000 rw-s 00000000 00:04 1343491
> /SYSV00000000 (deleted)
> 7f80fce2a000-7f80fce2b000 rw-s 00000000 00:04 1343491
> /SYSV00000000 (deleted)
>
> The segment at 1343491 is the one mapped twice. So that number (shmid)
> can be used to identify identical mappings. It appears the same way in both
> processes. The other 1376262 mapping is just to verify that each segment
> gets a different number.
Right, that's the inode number; I think you also need to add the
blockdev id (00:04) in this case as inode numbers are per device, not
global.
> So it looks possible to use this approach across process to identify identical
> physical mappings. However, this is not very practical.
>
> The first reason is that perf_event does not capture shmat() mappings in MMAP
> records.
oops, that would be something we'd definitely need to fix.
ipc/shm.c:SYSCALL_DEFINE3(shmat)
do_shmat()
do_mmap_pgoff()
mmap_region()
perf_event_mmap()
So why isn't it logging them? If its a non-exec map we need
attr::mmap_data but I suppose you have that enabled?
> The second is is that if you rely on /proc/PID/maps, you will have to
> have the tool
> constantly poll that file for new shared mappings. This is not how
> perf works today,
> not even in system-wide mode. /proc/PID/maps is swept only once when perf
> record -a is started.
Ahh. We don't put the useful bits in the mmap event; we'll need to fix
that too then ;-)
Doing so is going to be a bit of a bother since we use the tail of
PERF_RECORD_MMAP for filenames and thus aren't particularly extensible.
This would mean doing something like PERF_RECORD_MMAP2 and some means
for userspace to requrest the new events instead of the old one.
Didn't you already have patches to change the event layout? Can this
piggy back on that?
> Ingo is proposing a ioctl() to flush the mappings. But then, when is a
> good time to do this
> from the tool?
Yeah, that's not going to help with this; that's only to get rid of the
initial /proc/$pid/maps reading. Not to keep up with changes.
> So my approach with PERF_SAMPLE_PHYS_ADDR looks easier on the tools which
> if I recall is the philosophy behind perf_events.
Physical addresses are always going to cause problems, don't ever use
them.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/