Re: [PATCH] perf intel-pt: don't zero the whole perf_sample
From: Ian Rogers
Date: Mon Jan 13 2025 - 12:30:32 EST
On Mon, Jan 13, 2025 at 12:15 AM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
>
> On 11/01/25 19:56, Tavian Barnes wrote:
> > C designated initializers like
> >
> > struct perf_sample sample = { .ip = 0, };
> >
> > set every unmentioned field of the struct to zero. But since
> > sizeof(struct perf_sample) == 1384, this takes a long time.
> >
> > struct perf_sample does not need to be fully initialized, and even
>
> Yes it does need to be fully initialized. Leaving members
> uninitialized in the hope that they never get used adds to
> code complexity e.g. how do you know they never are used,
> or future members never will be used.
First thought, perhaps we can use sanitizers to catch mistakes but
leave the default optimized code to not have any overhead.
Second thought, the perf_sample looks like:
```
$ pahole perf
...
struct perf_sample {
u64 ip; /* 0 8 */
u32 pid; /* 8 4 */
u32 tid; /* 12 4 */
u64 time; /* 16 8 */
u64 addr; /* 24 8 */
u64 id; /* 32 8 */
u64 stream_id; /* 40 8 */
u64 period; /* 48 8 */
u64 weight; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u64 transaction; /* 64 8 */
u64 insn_cnt; /* 72 8 */
u64 cyc_cnt; /* 80 8 */
u32 cpu; /* 88 4 */
u32 raw_size; /* 92 4 */
u64 data_src; /* 96 8 */
u64 phys_addr; /* 104 8 */
u64 data_page_size; /* 112 8 */
u64 code_page_size; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
u64 cgroup; /* 128 8 */
u32 flags; /* 136 4 */
u32 machine_pid; /* 140 4 */
u32 vcpu; /* 144 4 */
u16 insn_len; /* 148 2 */
u8 cpumode; /* 150 1 */
/* XXX 1 byte hole, try to pack */
u16 misc; /* 152 2 */
u16 ins_lat; /* 154 2 */
union {
u16 p_stage_cyc; /* 156 2 */
u16 retire_lat; /* 156 2 */
}; /* 156 2 */
union {
u16 p_stage_cyc; /*
0 2 */
u16 retire_lat; /*
0 2 */
};
_Bool no_hw_idx; /* 158 1 */
char insn[16]; /* 159 16 */
/* XXX 1 byte hole, try to pack */
void * raw_data; /* 176 8 */
struct ip_callchain * callchain; /* 184 8 */
/* --- cacheline 3 boundary (192 bytes) --- */
struct branch_stack * branch_stack; /* 192 8 */
u64 * branch_stack_cntr; /* 200 8 */
struct regs_dump user_regs; /* 208 544 */
/* --- cacheline 11 boundary (704 bytes) was 48 bytes ago --- */
struct regs_dump intr_regs; /* 752 544 */
/* --- cacheline 20 boundary (1280 bytes) was 16 bytes ago --- */
struct stack_dump user_stack; /* 1296 24 */
struct sample_read read; /* 1320 40 */
/* --- cacheline 21 boundary (1344 bytes) was 16 bytes ago --- */
struct aux_sample aux_sample; /* 1360 16 */
struct simd_flags simd_flags; /* 1376 8 */
/* size: 1384, cachelines: 22, members: 39 */
/* sum members: 1382, holes: 2, sum holes: 2 */
/* last cacheline: 40 bytes */
};
```
The two regs_dump structs are 79% of the size. They comprise of:
```
struct regs_dump {
u64 abi; /* 0 8 */
u64 mask; /* 8 8 */
u64 * regs; /* 16 8 */
u64 cache_regs[64]; /* 24 512 */
/* --- cacheline 8 boundary (512 bytes) was 24 bytes ago --- */
u64 cache_mask; /* 536 8 */
/* size: 544, cachelines: 9, members: 5 */
/* last cacheline: 32 bytes */
};
```
So the cache_regs are taking up the majority of the space. I have a
WIP patch set to make user_regs and intr_regs optional/lazily
allocated. I'll send it out shortly.
Thanks,
Ian