Re: [PATCHv2 0/2] perf tools: Share struct map after clone

From: Arnaldo Carvalho de Melo
Date: Mon Nov 18 2019 - 16:48:58 EST


Em Mon, Nov 18, 2019 at 01:14:00PM +0100, Jiri Olsa escreveu:
> On Tue, Oct 29, 2019 at 09:58:55PM +0100, Jiri Olsa wrote:
> > > >
> > > > Also available in here:
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> > > > perf/map_shared

> > > I rebased to latest perf/core and pushed the branch out

> > rebased and pushed out

> heya,
> I lost track of this.. what's the status, are you going with your
> version, or is this one still in? I don't see any of them in latest
> code..

So, I'm still working on and off on this, current status is at:

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=perf/map_share

Its just one patch more than perf/core, the one that does the sharing.

The thing is, as I'm going over all the fields in 'struct map', it seems
that we'll end up with just one cacheline per instance, as there are
things there that are not strictly related to a map, but to a map_group
(unmap_ip/map_ip), or to a dso (maj, min, ino, ino_generation), and some
need less than what is allocated to them.

Current status is:

[root@quaco ~]# pahole -C map ~acme/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
u64 start; /* 24 8 */
u64 end; /* 32 8 */
_Bool erange_warned:1; /* 40: 0 1 */
_Bool priv:1; /* 40: 1 1 */

/* XXX 6 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */

u32 prot; /* 44 4 */
u64 pgoff; /* 48 8 */
u64 reloc; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 (*map_ip)(struct map *, u64); /* 64 8 */
u64 (*unmap_ip)(struct map *, u64); /* 72 8 */
struct dso * dso; /* 80 8 */
refcount_t refcnt; /* 88 4 */
u32 flags; /* 92 4 */

/* size: 96, cachelines: 2, members: 13 */
/* sum members: 92, holes: 1, sum holes: 3 */
/* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
/* forced alignments: 1 */
/* last cacheline: 32 bytes */
} __attribute__((__aligned__(8)));
[root@quaco ~]#

This is with the tentative move of maj/min/ino/ino_generation to 'struct
dso', but that needs more work to match the sort order that touches it
"dcacheline", i.e. a map that comes with the same backing DSO but
different values for those fields is not the same DSO, right?

Right now with moving the maj/min/etc to dso, in the map_share patch we
get the structure used to keep shared entries in the rb_tree at 40
bytes, under one cacheline, while the full 'struct map' is 32 bytes more
than one cacheline, so still good for sharing:

[acme@quaco perf]$ pahole -C map_node ~/bin/perf
struct map_node {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
refcount_t refcnt; /* 24 4 */
_Bool is_node:1; /* 28: 0 1 */

/* XXX 7 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */

struct map * map; /* 32 8 */

/* size: 40, cachelines: 1, members: 4 */
/* sum members: 36, holes: 1, sum holes: 3 */
/* sum bitfield members: 1 bits, bit holes: 1, sum bit holes: 7 bits */
/* forced alignments: 1 */
/* last cacheline: 40 bytes */
} __attribute__((__aligned__(8)));
[acme@quaco perf]$ pahole -C map ~/bin/perf
struct map {
union {
struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
struct list_head node; /* 0 16 */
} __attribute__((__aligned__(8))); /* 0 24 */
refcount_t refcnt; /* 24 4 */
_Bool is_node:1; /* 28: 0 1 */
_Bool erange_warned:1; /* 28: 1 1 */
_Bool priv:1; /* 28: 2 1 */

/* XXX 5 bits hole, try to pack */
/* XXX 3 bytes hole, try to pack */

u64 start; /* 32 8 */
u64 end; /* 40 8 */
u64 pgoff; /* 48 8 */
u64 reloc; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 (*map_ip)(struct map *, u64); /* 64 8 */
u64 (*unmap_ip)(struct map *, u64); /* 72 8 */
struct dso * dso; /* 80 8 */
u32 flags; /* 88 4 */
u32 prot; /* 92 4 */

/* size: 96, cachelines: 2, members: 14 */
/* sum members: 92, holes: 1, sum holes: 3 */
/* sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits */
/* forced alignments: 1 */
/* last cacheline: 32 bytes */
} __attribute__((__aligned__(8)));
[acme@quaco perf]$

So give me some more time, please :-)

The sharing map is this one, without the maj/map/ move to 'struct dso':

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/commit/?h=perf/map_share&id=451df1d4ad3c636f6be57b8e69b4f94c1bbf4a65

- Arnaldo