Re: [PATCH bpf v2 0/3] bpf: invalidate unused part of bpf_prog_pack

From: Linus Torvalds
Date: Wed Apr 27 2022 - 21:45:51 EST

On Wed, Apr 27, 2022 at 3:24 PM Song Liu <songliubraving@xxxxxx> wrote:
> Could you please share your suggestions on this set? Shall we ship it
> with 5.18?

I'd personally prefer to just not do the prog_pack thing at all, since
I don't think it was actually in a "ready to ship" state for this
merge window, and the hugepage mapping protection games I'm still
leery of.

Yes, the hugepage protection things probably do work from what I saw
when I looked through them, but that x86 vmalloc hugepage code was
really designed for another use (non-refcounted device pages), so the
fact that it all actually seems surprisingly ok certainly wasn't
because the code was designed to do that new case.

Does the prog_pack thing work with small pages?

Yes. But that wasn't what it was designed for or its selling point, so
it all is a bit suspect to me.

And I'm looking at those set_memory_xyz() calls, and I'm going "yeah,
I think it works on x86, but on ppc I look at it and I see

static inline int set_memory_ro(unsigned long addr, int numpages)
return change_memory_attr(addr, numpages, SET_MEMORY_RO);

and then in change_memory_attr() it does

if (WARN_ON_ONCE(is_vmalloc_or_module_addr((void *)addr) &&
is_vm_area_hugepages((void *)addr)))
return -EINVAL;

and I'm "this is all supposedly generic code, but I'm not seeing how
it works cross-architecture".

I *think* it's actually because this is all basically x86-specific due
to x86 being the only architecture that implements
bpf_arch_text_copy(), and everybody else then ends up erroring out and
not using the prog_pack thing after all.

And then one of the two places that use bpf_arch_text_copy() doesn't
even check the returned error code.

So this all ends up just depending on "only x86 will actually succeed
in bpf_jit_binary_pack_finalize(), everybody else will fail after
having done all the common setup".

End result: it all seems a bit broken right now. The "generic" code
only works on x86, and on other architectures it goes through the
motions and then fails at the end. And even on x86 I worry about
actually enabling it fully.

I'm not having the warm and fuzzies about this all, in other words.

Maybe people can convince me otherwise, but I think you need to work at it.

And even for 5.19+ kind of timeframes, I'd actually like the x86
people who maintain arch/x86/mm/pat/set_memory.c also sign off on
using that code for hugepage vmalloc mappings too.

I *think* it does. But that code has various subtle things going on.

I see PeterZ is cc'd (presumably because of the text_poke() stuff, not
because of the whole "call set_memory_ro() on virtually mapped kernel
largepage memory".

Did people even talk to x86 people about this, or did the whole "it
works, except it turns out set_vm_flush_reset_perms() doesn't work"
mean that the authors of that code never got involved?