Re: [PATCH v5 bpf-next 7/7] bpf, x86_64: use bpf_prog_pack allocator
From: Song Liu
Date: Fri Jan 21 2022 - 12:53:57 EST
> On Jan 20, 2022, at 8:59 PM, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Thu, Jan 20, 2022 at 11:13 AM Song Liu <song@xxxxxxxxxx> wrote:
>>
>> From: Song Liu <songliubraving@xxxxxx>
>>
>> Use bpf_prog_pack allocator in x86_64 jit.
>>
>> The program header from bpf_prog_pack is read only during the jit process.
>> Therefore, the binary is first written to a temporary buffer, and later
>> copied to final location with text_poke_copy().
>>
>> Similarly, jit_fill_hole() is updated to fill the hole with 0xcc using
>> text_poke_copy().
>>
>> Signed-off-by: Song Liu <songliubraving@xxxxxx>
>> ---
>> arch/x86/net/bpf_jit_comp.c | 134 +++++++++++++++++++++++++++---------
>> 1 file changed, 103 insertions(+), 31 deletions(-)
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index fe4f08e25a1d..6d97f7c24df2 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -216,11 +216,34 @@ static u8 simple_alu_opcodes[] = {
>> [BPF_ARSH] = 0xF8,
>> };
>>
>> +static char jit_hole_buffer[PAGE_SIZE] = {};
>
> Let's not waste a page filled with 0xcc.
> The pack allocator will reserve 128 bytes at the front
> and will round up the tail to 64 bytes.
> So this can be a 128 byte array?
>
>> +
>> static void jit_fill_hole(void *area, unsigned int size)
>> +{
>> + struct bpf_binary_header *hdr = area;
>> + int i;
>> +
>> + for (i = 0; i < roundup(size, PAGE_SIZE); i += PAGE_SIZE) {
>
> multi page 0xcc-ing?
> Is it because bpf_jit_binary_alloc_pack() allocates 2MB
> and then populates the whole thing with this?
> Seems overkill.
> 0xcc in the front of the prog and in the back is there
> to catch JIT bugs.
> No need to fill 2MB with it.
I got this logic because current code memset(0xcc) for the whole
buffer. We can change the logic to only 0xcc the first 128 bytes
and the last 64 bytes.
>
>
>> + int s;
>> +
>> + s = min_t(int, PAGE_SIZE, size - i);
>> + text_poke_copy(area + i, jit_hole_buffer, s);
>> + }
>> +
>> + /*
>> + * bpf_jit_binary_alloc_pack cannot write size directly to the ro
>> + * mapping. Write it here with text_poke_copy().
>> + */
>> + text_poke_copy(&hdr->size, &size, sizeof(size));
>> +}
[...]
>> @@ -2248,8 +2261,10 @@ int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs)
>>
>> struct x64_jit_data {
>> struct bpf_binary_header *header;
>> + struct bpf_binary_header *tmp_header;
>> int *addrs;
>> u8 *image;
>> + u8 *tmp_image;
>
> Why add these two fields here?
> With new logic header and image will be zero always?
header and image point to a section of the 2MB page; while tmp_header
and tmp_image point to a temporary buffer from kzalloc. We need them
in x86_jit_data, so that we can reuse the temporary buffer between
multiple calls of bpf_int_jit_compile(). It is used as:
bpf_int_jit_compile(...)
{
/* ... */
jit_data = prog->aux->jit_data;
if (!jit_data) {
/* kzalloc jit_data */
}
addrs = jit_data->addrs;
if (addrs) {
/* reuse previous jit_data */
}
> Maybe rename them instead?
> Or both used during JIT-ing?
>
>> int proglen;
>> struct jit_context ctx;
>> };
>> @@ -2259,6 +2274,7 @@ struct x64_jit_data {
>>
[...]
>> }
>> - prog->aux->extable = (void *) image + roundup(proglen, align);
>> + if (header->size > bpf_prog_pack_max_size()) {
>> + tmp_header = header;
>> + tmp_image = image;
>> + } else {
>> + tmp_header = kzalloc(header->size, GFP_KERNEL);
>
> the header->size could be just below 2MB.
> I don't think kzalloc() can handle that.
Technically, kzalloc can handle 2MB allocation via:
kzalloc() => kmalloc() => kmalloc_large() => kmalloc_order()
But this would fail when the memory is fragmented. I guess we should use
kvmalloc() instead?
>
>> + if (!tmp_header) {
>> + bpf_jit_binary_free_pack(header);
>> + header = NULL;
>> + prog = orig_prog;
>> + goto out_addrs;
>> + }
>> + tmp_header->size = header->size;
>> + tmp_image = (void *)tmp_header + ((void *)image - (void *)header);
>
> Why is 'tmp_image' needed at all?
> The above math can be done where necessary.
We pass both image and tmp_image to do_jit(), as it needs both of them.
I think maintaining a tmp_image variable makes the logic cleaner. We can
remove it from x64_jit_data, I guess.
>
>> + }
>> + prog->aux->extable = (void *)image + roundup(proglen, align);
>
> I suspect if you didn't remove the space between (void *) and image
> the diff would be less confusing.
> This line didn't change, right?
Yeah, I forgot why I changed it in the first place. Let me undo it.
Thanks,
Song
[...]