Re: [PATCH 0/8] perf: add ability to sample physical data addresses

From: Stephane Eranian
Date: Tue Jul 30 2013 - 04:02:12 EST


On Mon, Jul 8, 2013 at 10:19 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Sat, Jul 06, 2013 at 12:48:48AM +0200, Stephane Eranian wrote:
>> So, I tried on an example using shmat(). I map the same shared segment twice
>> in the same process. Then I fork(): I see this in /proc/PID/maps:
>>
>> 7f80fce28000-7f80fce29000 rw-s 00000000 00:04 1376262
>> /SYSV00000000 (deleted)
>> 7f80fce29000-7f80fce2a000 rw-s 00000000 00:04 1343491
>> /SYSV00000000 (deleted)
>> 7f80fce2a000-7f80fce2b000 rw-s 00000000 00:04 1343491
>> /SYSV00000000 (deleted)
>>
>> The segment at 1343491 is the one mapped twice. So that number (shmid)
>> can be used to identify identical mappings. It appears the same way in both
>> processes. The other 1376262 mapping is just to verify that each segment
>> gets a different number.
>
> Right, that's the inode number; I think you also need to add the
> blockdev id (00:04) in this case as inode numbers are per device, not
> global.
>
>> So it looks possible to use this approach across process to identify identical
>> physical mappings. However, this is not very practical.
>>
>> The first reason is that perf_event does not capture shmat() mappings in MMAP
>> records.
>
> oops, that would be something we'd definitely need to fix.
>
> ipc/shm.c:SYSCALL_DEFINE3(shmat)
> do_shmat()
> do_mmap_pgoff()
> mmap_region()
> perf_event_mmap()
>
> So why isn't it logging them? If its a non-exec map we need
> attr::mmap_data but I suppose you have that enabled?
>
>> The second is is that if you rely on /proc/PID/maps, you will have to
>> have the tool
>> constantly poll that file for new shared mappings. This is not how
>> perf works today,
>> not even in system-wide mode. /proc/PID/maps is swept only once when perf
>> record -a is started.
>
> Ahh. We don't put the useful bits in the mmap event; we'll need to fix
> that too then ;-)
>
> Doing so is going to be a bit of a bother since we use the tail of
> PERF_RECORD_MMAP for filenames and thus aren't particularly extensible.
>
> This would mean doing something like PERF_RECORD_MMAP2 and some means
> for userspace to requrest the new events instead of the old one.
>
Tracking mmaps even for shmat() won't cover the paging cases. When you page a
page back in, it most likely gets a different physical page. How would
we track that
case too using the same approach?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/