Re: [PATCHv2 0/2] perf tools: Share struct map after clone

From: Jiri Olsa
Date: Tue Nov 19 2019 - 06:06:54 EST


On Mon, Nov 18, 2019 at 06:48:51PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Nov 18, 2019 at 01:14:00PM +0100, Jiri Olsa escreveu:
> > On Tue, Oct 29, 2019 at 09:58:55PM +0100, Jiri Olsa wrote:
> > > > >
> > > > > Also available in here:
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> > > > > perf/map_shared
>
> > > > I rebased to latest perf/core and pushed the branch out
>
> > > rebased and pushed out
>
> > heya,
> > I lost track of this.. what's the status, are you going with your
> > version, or is this one still in? I don't see any of them in latest
> > code..
>
> So, I'm still working on and off on this, current status is at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/log/?h=perf/map_share
>
> Its just one patch more than perf/core, the one that does the sharing.
>
> The thing is, as I'm going over all the fields in 'struct map', it seems
> that we'll end up with just one cacheline per instance, as there are
> things there that are not strictly related to a map, but to a map_group
> (unmap_ip/map_ip), or to a dso (maj, min, ino, ino_generation), and some
> need less than what is allocated to them.
>
> Current status is:
>
> [root@quaco ~]# pahole -C map ~acme/bin/perf
> struct map {
> union {
> struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
> struct list_head node; /* 0 16 */
> } __attribute__((__aligned__(8))); /* 0 24 */
> u64 start; /* 24 8 */
> u64 end; /* 32 8 */
> _Bool erange_warned:1; /* 40: 0 1 */
> _Bool priv:1; /* 40: 1 1 */
>
> /* XXX 6 bits hole, try to pack */
> /* XXX 3 bytes hole, try to pack */
>
> u32 prot; /* 44 4 */
> u64 pgoff; /* 48 8 */
> u64 reloc; /* 56 8 */
> /* --- cacheline 1 boundary (64 bytes) --- */
> u64 (*map_ip)(struct map *, u64); /* 64 8 */
> u64 (*unmap_ip)(struct map *, u64); /* 72 8 */
> struct dso * dso; /* 80 8 */
> refcount_t refcnt; /* 88 4 */
> u32 flags; /* 92 4 */
>
> /* size: 96, cachelines: 2, members: 13 */
> /* sum members: 92, holes: 1, sum holes: 3 */
> /* sum bitfield members: 2 bits, bit holes: 1, sum bit holes: 6 bits */
> /* forced alignments: 1 */
> /* last cacheline: 32 bytes */
> } __attribute__((__aligned__(8)));
> [root@quaco ~]#
>
> This is with the tentative move of maj/min/ino/ino_generation to 'struct
> dso', but that needs more work to match the sort order that touches it
> "dcacheline", i.e. a map that comes with the same backing DSO but
> different values for those fields is not the same DSO, right?
>
> Right now with moving the maj/min/etc to dso, in the map_share patch we
> get the structure used to keep shared entries in the rb_tree at 40
> bytes, under one cacheline, while the full 'struct map' is 32 bytes more
> than one cacheline, so still good for sharing:
>
> [acme@quaco perf]$ pahole -C map_node ~/bin/perf
> struct map_node {
> union {
> struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
> struct list_head node; /* 0 16 */
> } __attribute__((__aligned__(8))); /* 0 24 */
> refcount_t refcnt; /* 24 4 */
> _Bool is_node:1; /* 28: 0 1 */
>
> /* XXX 7 bits hole, try to pack */
> /* XXX 3 bytes hole, try to pack */
>
> struct map * map; /* 32 8 */
>
> /* size: 40, cachelines: 1, members: 4 */
> /* sum members: 36, holes: 1, sum holes: 3 */
> /* sum bitfield members: 1 bits, bit holes: 1, sum bit holes: 7 bits */
> /* forced alignments: 1 */
> /* last cacheline: 40 bytes */
> } __attribute__((__aligned__(8)));
> [acme@quaco perf]$ pahole -C map ~/bin/perf
> struct map {
> union {
> struct rb_node rb_node __attribute__((__aligned__(8))); /* 0 24 */
> struct list_head node; /* 0 16 */
> } __attribute__((__aligned__(8))); /* 0 24 */
> refcount_t refcnt; /* 24 4 */
> _Bool is_node:1; /* 28: 0 1 */
> _Bool erange_warned:1; /* 28: 1 1 */
> _Bool priv:1; /* 28: 2 1 */
>
> /* XXX 5 bits hole, try to pack */
> /* XXX 3 bytes hole, try to pack */
>
> u64 start; /* 32 8 */
> u64 end; /* 40 8 */
> u64 pgoff; /* 48 8 */
> u64 reloc; /* 56 8 */
> /* --- cacheline 1 boundary (64 bytes) --- */
> u64 (*map_ip)(struct map *, u64); /* 64 8 */
> u64 (*unmap_ip)(struct map *, u64); /* 72 8 */
> struct dso * dso; /* 80 8 */
> u32 flags; /* 88 4 */
> u32 prot; /* 92 4 */
>
> /* size: 96, cachelines: 2, members: 14 */
> /* sum members: 92, holes: 1, sum holes: 3 */
> /* sum bitfield members: 3 bits, bit holes: 1, sum bit holes: 5 bits */
> /* forced alignments: 1 */
> /* last cacheline: 32 bytes */
> } __attribute__((__aligned__(8)));
> [acme@quaco perf]$
>
> So give me some more time, please :-)

sure ;-) I just did not want to loose track of this

thanks,
jirka