Re: [PATCH 0/8] perf: add ability to sample physical data addresses
From: Stephane Eranian
Date: Tue Jul 09 2013 - 02:02:50 EST
On Mon, Jul 8, 2013 at 10:19 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Sat, Jul 06, 2013 at 12:48:48AM +0200, Stephane Eranian wrote:
>> So, I tried on an example using shmat(). I map the same shared segment twice
>> in the same process. Then I fork(): I see this in /proc/PID/maps:
>>
>> 7f80fce28000-7f80fce29000 rw-s 00000000 00:04 1376262
>> /SYSV00000000 (deleted)
>> 7f80fce29000-7f80fce2a000 rw-s 00000000 00:04 1343491
>> /SYSV00000000 (deleted)
>> 7f80fce2a000-7f80fce2b000 rw-s 00000000 00:04 1343491
>> /SYSV00000000 (deleted)
>>
>> The segment at 1343491 is the one mapped twice. So that number (shmid)
>> can be used to identify identical mappings. It appears the same way in both
>> processes. The other 1376262 mapping is just to verify that each segment
>> gets a different number.
>
> Right, that's the inode number; I think you also need to add the
> blockdev id (00:04) in this case as inode numbers are per device, not
> global.
>
Well, in the case of SHM sewgment, this is the shmid:
7fb92d4c8000-7fb92d4c9000 rw-s 00000000 00:04 294913
/SYSV00000000 (deleted)
7fb92d4c9000-7fb92d4ca000 rw-s 00000000 00:04 262144
/SYSV00000000 (deleted)
7fb92d4ca000-7fb92d4cb000 rw-s 00000000 00:04 262144
/SYSV00000000 (deleted)
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 262144 eranian 600 256 6 dest
0x00000000 294913 eranian 600 256 3 dest
>> So it looks possible to use this approach across process to identify identical
>> physical mappings. However, this is not very practical.
>>
>> The first reason is that perf_event does not capture shmat() mappings in MMAP
>> records.
>
> oops, that would be something we'd definitely need to fix.
>
> ipc/shm.c:SYSCALL_DEFINE3(shmat)
> do_shmat()
> do_mmap_pgoff()
> mmap_region()
> perf_event_mmap()
>
> So why isn't it logging them? If its a non-exec map we need
> attr::mmap_data but I suppose you have that enabled?
>
My bad, I was not use load/store sampling mode, just regular sampling
thus mmap_data was not set, and thus data mappings were not captured.
I see the SHM segment with perf mem:
# Overhead Samples Local Weight Memory access
Symbol Data Symbol
Data Object
# ........ ............ ............ ........................
.......................... ...................................
...................................
#
45.70% 4572 6 L1 hit [.]
spin [.] 0x00007f8d6252b080
SYSV00000000 (deleted)
45.49% 4551 6 L1 hit [.]
spin [.] 0x00007f8d6252c000
SYSV00000000 (deleted)
So I think what we need in the MMAP record in this case is the shmid to uniquely
identify segments.
>> The second is is that if you rely on /proc/PID/maps, you will have to
>> have the tool
>> constantly poll that file for new shared mappings. This is not how
>> perf works today,
>> not even in system-wide mode. /proc/PID/maps is swept only once when perf
>> record -a is started.
>
> Ahh. We don't put the useful bits in the mmap event; we'll need to fix
> that too then ;-)
>
Yes, I think with the shmid (or inode number) we could have this fixed.
> Doing so is going to be a bit of a bother since we use the tail of
> PERF_RECORD_MMAP for filenames and thus aren't particularly extensible.
>
> This would mean doing something like PERF_RECORD_MMAP2 and some means
> for userspace to requrest the new events instead of the old one.
>
Unfortunately, yes. Because the length of the filename string is calculated
by:
f_len = ehdr->size - sizeof(struct mmap_event) - sizeof (*ehdr) -
sizeof(sample_id_all);
Given that sample_id_all is already optional, we are screwed. I think
it would have been
much more flexible to have the string length preceeding the string
itself like how it's done
in the perf.data file. But it's too late now.
> Didn't you already have patches to change the event layout? Can this
> piggy back on that?
>
I think the proposal from Hunter with PERF_SAMPLE_IDENTIFIER is simpler
than my solution which was to commandeer a bit in the ehdr->misc bitmask to
indicate that there is a record extension at the end of the sample and that it
contains the eventID.
>> Ingo is proposing a ioctl() to flush the mappings. But then, when is a
>> good time to do this
>> from the tool?
>
> Yeah, that's not going to help with this; that's only to get rid of the
> initial /proc/$pid/maps reading. Not to keep up with changes.
>
>> So my approach with PERF_SAMPLE_PHYS_ADDR looks easier on the tools which
>> if I recall is the philosophy behind perf_events.
>
> Physical addresses are always going to cause problems, don't ever use
> them.
>
Okay, all we need is a way to detect that virtual addresses correspond to the
same actual data. Knowing that the virtual to physical mapping has changed
or is different is good enough. Don't need the actual physical address. So yes,
any kind of flag or unique value would do.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/