On 22 October 2018 at 20:06, Edgecombe, Rick P
<rick.p.edgecombe@xxxxxxxxx> wrote:
On Sat, 2018-10-20 at 19:20 +0200, Ard Biesheuvel wrote:
Hi Rick,AFAIK, it's like you said about relative instruction limits, but also because
On 19 October 2018 at 22:47, Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
wrote:
> If BPF JIT is on, there is no effective limit to prevent filling the entire
> module space with JITed e/BPF filters.
Why do BPF filters use the module space, and does this reason apply to
all architectures?
On arm64, we already support loading plain modules far away from the
core kernel (i.e. out of range for ordinary relative jump/calll
instructions), and so I'd expect BPF to be able to deal with this
already as well. So for arm64, I wonder why an ordinary vmalloc_exec()
wouldn't be more appropriate.
some predictors don't predict past a certain distance. So performance as well.
Not sure the reasons for each arch, or if they all apply for BPF JIT. There seem
to be 8 by my count, that have a dedicated module space for some reason.
So before refactoring the module alloc/free routines to accommodateI am not a BPF JIT expert unfortunately, hopefully someone more authoritative
BPF, I'd like to take one step back and assess whether it wouldn't be
more appropriate to have a separate bpf_alloc/free API, which could be
totally separate from module alloc/free if the arch permits it.
will chime in. I only ran into this because I was trying to increase
randomization for the module space and wanted to find out how many allocations
needed to be supported.
I'd guess though, that BPF JIT is just assuming that there will be some arch
specific constraints about where text can be placed optimally and they would
already be taken into account in the module space allocator.
If there are no constraints for some arch, I'd wonder why not just update its
module_alloc to use the whole space available. What exactly are the constraints
for arm64 for normal modules?
Relative branches and the interactions with KAsan.
We just fixed something similar in the kprobes code: it was using
RWX-mapped module memory to store kprobed instructions, and we
replaced that with a simple vmalloc_exec() [and code to remap it
read-only], which was possible given that relative branches are always
emulated by arm64 kprobes.
So it depends on whether BPF code needs to be in relative branching
range from the calling code, and whether the BPF code itself is
emitted using relative branches into other parts of the code.
It seems fine to me for architectures to have the option of solving this a
different way. If potentially the rlimit ends up being the best solution for
some architectures though, do you think this refactor (pretty close to just a
name change) is that intrusive?
I guess it could be just a BPF JIT per user limit and not touch module space,
but I thought the module space limit was a more general solution as that is the
actual limited resource.
I think it is wrong to conflate the two things. Limiting the number of
BPF allocations and the limiting number of module allocations are two
separate things, and the fact that BPF reuses module_alloc() out of
convenience does not mean a single rlimit for both is appropriate.