Re: [PATCH] arm64: kprobe: Enable OPTPROBE for arm64

From: Masami Hiramatsu
Date: Fri Jul 30 2021 - 21:15:43 EST


On Fri, 30 Jul 2021 10:04:06 +0000
"Song Bao Hua (Barry Song)" <song.bao.hua@xxxxxxxxxxxxx> wrote:

> > > > >
> > > > > Hi Qi,
> > > > >
> > > > > Thanks for your effort!
> > > > >
> > > > > On Mon, 19 Jul 2021 20:24:17 +0800
> > > > > Qi Liu <liuqi115@xxxxxxxxxx> wrote:
> > > > >
> > > > > > This patch introduce optprobe for ARM64. In optprobe, probed
> > > > > > instruction is replaced by a branch instruction to detour
> > > > > > buffer. Detour buffer contains trampoline code and a call to
> > > > > > optimized_callback(). optimized_callback() calls opt_pre_handler()
> > > > > > to execute kprobe handler.
> > > > >
> > > > > OK so this will replace only one instruction.
> > > > >
> > > > > >
> > > > > > Limitations:
> > > > > > - We only support !CONFIG_RANDOMIZE_MODULE_REGION_FULL case to
> > > > > > guarantee the offset between probe point and kprobe pre_handler
> > > > > > is not larger than 128MiB.
> > > > >
> > > > > Hmm, shouldn't we depends on !CONFIG_ARM64_MODULE_PLTS? Or,
> > > > > allocate an intermediate trampoline area similar to arm optprobe
> > > > > does.
> > > >
> > > > Depending on !CONFIG_ARM64_MODULE_PLTS will totally disable
> > > > RANDOMIZE_BASE according to arch/arm64/Kconfig:
> > > > config RANDOMIZE_BASE
> > > > bool "Randomize the address of the kernel image"
> > > > select ARM64_MODULE_PLTS if MODULES
> > > > select RELOCATABLE
> > >
> > > Yes, but why it is required for "RANDOMIZE_BASE"?
> > > Does that imply the module call might need to use PLT in
> > > some cases?
> > >
> > > >
> > > > Depending on !RANDOMIZE_MODULE_REGION_FULL seems to be still
> > > > allowing RANDOMIZE_BASE via avoiding long jump according to:
> > > > arch/arm64/Kconfig:
> > > >
> > > > config RANDOMIZE_MODULE_REGION_FULL
> > > > bool "Randomize the module region over a 4 GB range"
> > > > depends on RANDOMIZE_BASE
> > > > default y
> > > > help
> > > > Randomizes the location of the module region inside a 4 GB window
> > > > covering the core kernel. This way, it is less likely for modules
> > > > to leak information about the location of core kernel data structures
> > > > but it does imply that function calls between modules and the core
> > > > kernel will need to be resolved via veneers in the module PLT.
> > > >
> > > > When this option is not set, the module region will be randomized over
> > > > a limited range that contains the [_stext, _etext] interval of the
> > > > core kernel, so branch relocations are always in range.
> > >
> > > Hmm, this dependency looks strange. If it always in range, don't we need
> > > PLT for modules?
> > >
> > > Cataline, would you know why?
> > > Maybe it's a KASLR's Kconfig issue?
> >
> > I actually didn't see any problem after making this change:
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index e07e7de9ac49..6440671b72e0 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -1781,7 +1781,6 @@ config RELOCATABLE
> >
> > config RANDOMIZE_BASE
> > bool "Randomize the address of the kernel image"
> > - select ARM64_MODULE_PLTS if MODULES
> > select RELOCATABLE
> > help
> > Randomizes the virtual address at which the kernel image is
> > @@ -1801,6 +1800,7 @@ config RANDOMIZE_BASE
> > config RANDOMIZE_MODULE_REGION_FULL
> > bool "Randomize the module region over a 4 GB range"
> > depends on RANDOMIZE_BASE
> > + select ARM64_MODULE_PLTS if MODULES
> > default y
> > help
> > Randomizes the location of the module region inside a 4 GB window
> >
> > and having this config:
> > # zcat /proc/config.gz | grep RANDOMIZE_BASE
> > CONFIG_RANDOMIZE_BASE=y
> >
> > # zcat /proc/config.gz | grep RANDOMIZE_MODULE_REGION_FULL
> > # CONFIG_RANDOMIZE_MODULE_REGION_FULL is not set
> >
> > # zcat /proc/config.gz | grep ARM64_MODULE_PLTS
> > # CONFIG_ARM64_MODULE_PLTS is not set
> >
> > Modules work all good:
> > # lsmod
> > Module Size Used by
> > btrfs 1355776 0
> > blake2b_generic 20480 0
> > libcrc32c 16384 1 btrfs
> > xor 20480 1 btrfs
> > xor_neon 16384 1 xor
> > zstd_compress 163840 1 btrfs
> > raid6_pq 110592 1 btrfs
> > ctr 16384 0
> > md5 16384 0
> > ip_tunnel 32768 0
> > ipv6 442368 28
> >
> >
> > I am not quite sure if there is a corner case. If no,
> > I would think the kconfig might be some improper.
>
> The corner case is that even CONFIG_RANDOMIZE_MODULE_REGION_FULL
> is not enabled, but if CONFIG_ARM64_MODULE_PLTS is enabled, when
> we can't get memory from the 128MB area in case the area is exhausted,
> we will fall back in module_alloc() to a 2GB area as long as either
> of the below two conditions is met:
>
> 1. KASAN is not enabled
> 2. KASAN is enabled and CONFIG_KASAN_VMALLOC is also enabled.
>
> void *module_alloc(unsigned long size)
> {
> u64 module_alloc_end = module_alloc_base + MODULES_VSIZE;
> gfp_t gfp_mask = GFP_KERNEL;
> void *p;
>
> /* Silence the initial allocation */
> if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS))
> gfp_mask |= __GFP_NOWARN;
>
> if (IS_ENABLED(CONFIG_KASAN_GENERIC) ||
> IS_ENABLED(CONFIG_KASAN_SW_TAGS))
> /* don't exceed the static module region - see below */
> module_alloc_end = MODULES_END;
>
> p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> module_alloc_end, gfp_mask, PAGE_KERNEL, 0,
> NUMA_NO_NODE, __builtin_return_address(0));
>
> if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
> (IS_ENABLED(CONFIG_KASAN_VMALLOC) ||
> (!IS_ENABLED(CONFIG_KASAN_GENERIC) &&
> !IS_ENABLED(CONFIG_KASAN_SW_TAGS))))
> /*
> * KASAN without KASAN_VMALLOC can only deal with module
> * allocations being served from the reserved module region,
> * since the remainder of the vmalloc region is already
> * backed by zero shadow pages, and punching holes into it
> * is non-trivial. Since the module region is not randomized
> * when KASAN is enabled without KASAN_VMALLOC, it is even
> * less likely that the module region gets exhausted, so we
> * can simply omit this fallback in that case.
> */
> p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base,
> module_alloc_base + SZ_2G, GFP_KERNEL,
> PAGE_KERNEL, 0, NUMA_NO_NODE,
> __builtin_return_address(0));
>
> if (p && (kasan_module_alloc(p, size) < 0)) {
> vfree(p);
> return NULL;
> }
>
> return p;
> }
>
> This should be happening quite rarely. But maybe arm64's document
> needs some minor fixup, otherwise, it is quite confusing.

OK, so CONFIG_KASAN_VLALLOC=y and CONFIG_ARM64_MODULE_PLTS=y, the module_alloc()
basically returns the memory in 128MB region, but can return the memory in 2GB
region. (This is OK because optprobe can filter it out)
But CONFIG_RANDOMIZE_MODULE_REGION_FULL=y, there is almost no chance to get
the memory in 128MB region.

Hmm, for the optprobe in kernel text, maybe we can define 'optinsn_alloc_start'
by 'module_alloc_base - (SZ_2G - MODULES_VADDR)' and use __vmalloc_node_range()
to avoid this issue. But that is only for the kernel. For the modules, we may
always out of 128MB region.

Thank you,

--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>