set_memory_* (was: Re: BUG: unable to handle kernel paging request in bpf_int_jit_compile)
From: Daniel Borkmann
Date: Tue Jun 26 2018 - 18:54:39 EST
On 06/24/2018 12:02 PM, Ingo Molnar wrote:
> * David Miller <davem@xxxxxxxxxxxxx> wrote:
>> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Date: Sun, 24 Jun 2018 09:09:09 +0200 (CEST)
>>
>>> I'm really tempted to make the BPF config switch depend on BROKEN.
>>
>> This really isn't necessary Thomas.
>>
>> Whoever wrote the code didn't understand that set ro can legitimately
>> fail.
>
> No, that's *NOT* the only thing that happened, according to the Git history.
>
> The first use of set_memory_ro() in include/linux/filter.h was added by
> this commit almost four years ago:
>
> # 2014/09
> 60a3b2253c41 ("net: bpf: make eBPF interpreter images read-only")
>
> ... and yes, that commit didn't anticipate the (in hindsight) obvious property of
> a function that changes global kernel mappings that if it is used after bootup
> without locking it 'may fail'. So that commit slipping through is 'shit happens'
> and I don't think we ever complained about such things slipping through.
Hmm, back then I adapted the code similar from 314beb9bcabf ("x86: bpf_jit_comp:
secure bpf jit against spraying attacks") for interpreter images as well, and
from grepping through the kernel code none of the callers of set_memory_{ro,rw}()
at that time (& now except bpf) did check for the return code (e.g. module_enable_ro()
and module_disable_ro() as one example which could happen late after bootup has
finished when pulling in modules on the fly).
I did made the mistake in 9facc336876f ("bpf: reject any prog that failed read-only
lock") assuming that after the set_memory_ro() call it would either succeed or
it would not, but not leaving us in a state in the middle. That was silly assumption
and I'll fix this up in bpf, very sorry about that! I've been debugging the syzkaller
BUG at [1] and noticed that even though set_memory_ro() failed with an error, doing
a probe_kernel_write() on it afterwards failed with EFAULT, meaning the module_alloc()
memory was however set to read-only at that point triggering later the BUG when
attempting to change its memory (at least on the virtual mem). From debugging output,
it was a single 4k page and on x86_64 in the __change_page_attr_set_clr() we failed
in the cpa_process_alias() where the syzkaller fault injection happened. So latter
failure from cpa_process_alias() came from call to __change_page_attr_set_clr() with
primary to 0, where it tried to split a large page in __change_page_attr() but failed
in alloc_pages() thus returning the -ENOMEM from there. Testing subsequent undoing
via set_memory_rw() made it writable again, though.
In any case, for pairs like set_memory_ro() + set_memory_rw() that are also used
outside of bpf e.g. STRICT_MODULE_RWX and friends which are mostly default these
days for some archs, is the choice to not check errors from there by design or from
historical context that it originated from 'debugging code' in that sense (DEBUG_RODATA /
DEBUG_SET_MODULE_RONX) earlier? Also if no-one checks for errors (and if that would
infact be the recommendation it is agreed upon) should the API be changed to void,
or generally should actual error checking occur on these + potential rollback; but
then question is what about restoring part from prior set_memory_ro() via set_memory_rw()?
Kees/others, do you happen to have some more context on recommended use around this
by any chance? (Would probably also help if we add some doc around assumptions into
include/linux/set_memory.h for future users.)
Thanks a lot,
Daniel
[1] https://syzkaller.appspot.com/bug?extid=a4eb8c7766952a1ca872