Re: [PATCH v4 00/10] Function Granular KASLR

From: Kristen Carlson Accardi
Date: Fri Aug 07 2020 - 12:38:35 EST


On Tue, 2020-08-04 at 14:23 -0400, Joe Lawrence wrote:
> On Fri, Jul 17, 2020 at 09:59:57AM -0700, Kristen Carlson Accardi
> wrote:
> > Function Granular Kernel Address Space Layout Randomization
> > (fgkaslr)
> > -----------------------------------------------------------------
> > ----
> >
> > This patch set is an implementation of finer grained kernel address
> > space
> > randomization. It rearranges your kernel code at load time
> > on a per-function level granularity, with only around a second
> > added to
> > boot time.
> >
> > Changes in v4:
> > -------------
> > * dropped the patch to split out change to STATIC definition in
> > x86/boot/compressed/misc.c and replaced with a patch authored
> > by Kees Cook to avoid the duplicate malloc definitions
> > * Added a section to Documentation/admin-guide/kernel-
> > parameters.txt
> > to document the fgkaslr boot option.
> > * redesigned the patch to hide the new layout when reading
> > /proc/kallsyms. The previous implementation utilized a
> > dynamically
> > allocated linked list to display the kernel and module symbols
> > in alphabetical order. The new implementation uses a randomly
> > shuffled index array to display the kernel and module symbols
> > in a random order.
> >
> > Changes in v3:
> > -------------
> > * Makefile changes to accommodate
> > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> > * removal of extraneous ALIGN_PAGE from _etext changes
> > * changed variable names in x86/tools/relocs to be less confusing
> > * split out change to STATIC definition in
> > x86/boot/compressed/misc.c
> > * Updates to Documentation to make it more clear what is preserved
> > in .text
> > * much more detailed commit message for function granular KASLR
> > patch
> > * minor tweaks and changes that make for more readable code
> > * this cover letter updated slightly to add additional details
> >
> > Changes in v2:
> > --------------
> > * Fix to address i386 build failure
> > * Allow module reordering patch to be configured separately so that
> > arm (or other non-x86_64 arches) can take advantage of module
> > function
> > reordering. This support has not be tested by me, but smoke
> > tested by
> > Ard Biesheuvel <ardb@xxxxxxxxxx> on arm.
> > * Fix build issue when building on arm as reported by
> > Ard Biesheuvel <ardb@xxxxxxxxxx>
> >
> > Patches to objtool are included because they are dependencies for
> > this
> > patchset, however they have been submitted by their maintainer
> > separately.
> >
> > Background
> > ----------
> > KASLR was merged into the kernel with the objective of increasing
> > the
> > difficulty of code reuse attacks. Code reuse attacks reused
> > existing code
> > snippets to get around existing memory protections. They exploit
> > software bugs
> > which expose addresses of useful code snippets to control the flow
> > of
> > execution for their own nefarious purposes. KASLR moves the entire
> > kernel
> > code text as a unit at boot time in order to make addresses less
> > predictable.
> > The order of the code within the segment is unchanged - only the
> > base address
> > is shifted. There are a few shortcomings to this algorithm.
> >
> > 1. Low Entropy - there are only so many locations the kernel can
> > fit in. This
> > means an attacker could guess without too much trouble.
> > 2. Knowledge of a single address can reveal the offset of the base
> > address,
> > exposing all other locations for a published/known kernel image.
> > 3. Info leaks abound.
> >
> > Finer grained ASLR has been proposed as a way to make ASLR more
> > resistant
> > to info leaks. It is not a new concept at all, and there are many
> > variations
> > possible. Function reordering is an implementation of finer grained
> > ASLR
> > which randomizes the layout of an address space on a function level
> > granularity. We use the term "fgkaslr" in this document to refer to
> > the
> > technique of function reordering when used with KASLR, as well as
> > finer grained
> > KASLR in general.
> >
> > Proposed Improvement
> > --------------------
> > This patch set proposes adding function reordering on top of the
> > existing
> > KASLR base address randomization. The over-arching objective is
> > incremental
> > improvement over what we already have. It is designed to work in
> > combination
> > with the existing solution. The implementation is really pretty
> > simple, and
> > there are 2 main area where changes occur:
> >
> > * Build time
> >
> > GCC has had an option to place functions into individual .text
> > sections for
> > many years now. This option can be used to implement function
> > reordering at
> > load time. The final compiled vmlinux retains all the section
> > headers, which
> > can be used to help find the address ranges of each function. Using
> > this
> > information and an expanded table of relocation addresses,
> > individual text
> > sections can be suffled immediately after decompression. Some data
> > tables
> > inside the kernel that have assumptions about order require re-
> > sorting
> > after being updated when applying relocations. In order to modify
> > these tables,
> > a few key symbols are excluded from the objcopy symbol stripping
> > process for
> > use after shuffling the text segments.
> >
> > Some highlights from the build time changes to look for:
> >
> > The top level kernel Makefile was modified to add the gcc flag if
> > it
> > is supported. Currently, I am applying this flag to everything it
> > is
> > possible to randomize. Anything that is written in C and not
> > present in a
> > special input section is randomized. The final binary segment 0
> > retains a
> > consolidated .text section, as well as all the individual .text.*
> > sections.
> > Future work could turn off this flags for selected files or even
> > entire
> > subsystems, although obviously at the cost of security.
> >
> > The relocs tool is updated to add relative relocations. This
> > information
> > previously wasn't included because it wasn't necessary when moving
> > the
> > entire .text segment as a unit.
> >
> > A new file was created to contain a list of symbols that objcopy
> > should
> > keep. We use those symbols at load time as described below.
> >
> > * Load time
> >
> > The boot kernel was modified to parse the vmlinux elf file after
> > decompression to check for our interesting symbols that we kept,
> > and to
> > look for any .text.* sections to randomize. The consolidated .text
> > section
> > is skipped and not moved. The sections are shuffled randomly, and
> > copied
> > into memory following the .text section in a new random order. The
> > existing
> > code which updated relocation addresses was modified to account for
> > not just a fixed delta from the load address, but the offset that
> > the function
> > section was moved to. This requires inspection of each address to
> > see if
> > it was impacted by a randomization. We use a bsearch to make this
> > less
> > horrible on performance. Any tables that need to be modified with
> > new
> > addresses or resorted are updated using the symbol addresses parsed
> > from the
> > elf symbol table.
> >
> > In order to hide our new layout, symbols reported through
> > /proc/kallsyms
> > will be displayed in a random order.
> >
> > Security Considerations
> > -----------------------
> > The objective of this patch set is to improve a technology that is
> > already
> > merged into the kernel (KASLR). This code will not prevent all
> > attacks,
> > but should instead be considered as one of several tools that can
> > be used.
> > In particular, this code is meant to make KASLR more effective in
> > the presence
> > of info leaks.
> >
> > How much entropy we are adding to the existing entropy of standard
> > KASLR will
> > depend on a few variables. Firstly and most obviously, the number
> > of functions
> > that are randomized matters. This implementation keeps the existing
> > .text
> > section for code that cannot be randomized - for example, because
> > it was
> > assembly code. The less sections to randomize, the less entropy. In
> > addition,
> > due to alignment (16 bytes for x86_64), the number of bits in a
> > address that
> > the attacker needs to guess is reduced, as the lower bits are
> > identical.
> >
> > Performance Impact
> > ------------------
> > There are two areas where function reordering can impact
> > performance: boot
> > time latency, and run time performance.
> >
> > * Boot time latency
> > This implementation of finer grained KASLR impacts the boot time of
> > the kernel
> > in several places. It requires additional parsing of the kernel ELF
> > file to
> > obtain the section headers of the sections to be randomized. It
> > calls the
> > random number generator for each section to be randomized to
> > determine that
> > section's new memory location. It copies the decompressed kernel
> > into a new
> > area of memory to avoid corruption when laying out the newly
> > randomized
> > sections. It increases the number of relocations the kernel has to
> > perform at
> > boot time vs. standard KASLR, and it also requires a lookup on each
> > address
> > that needs to be relocated to see if it was in a randomized section
> > and needs
> > to be adjusted by a new offset. Finally, it re-sorts a few data
> > tables that
> > are required to be sorted by address.
> >
> > Booting a test VM on a modern, well appointed system showed an
> > increase in
> > latency of approximately 1 second.
> >
> > * Run time
> > The performance impact at run-time of function reordering varies by
> > workload.
> > Using kcbench, a kernel compilation benchmark, the performance of a
> > kernel
> > build with finer grained KASLR was about 1% slower than a kernel
> > with standard
> > KASLR. Analysis with perf showed a slightly higher percentage of
> > L1-icache-load-misses. Other workloads were examined as well, with
> > varied
> > results. Some workloads performed significantly worse under
> > FGKASLR, while
> > others stayed the same or were mysteriously better. In general, it
> > will
> > depend on the code flow whether or not finer grained KASLR will
> > impact
> > your workload, and how the underlying code was designed. Because
> > the layout
> > changes per boot, each time a system is rebooted the performance of
> > a workload
> > may change.
> >
> > Future work could identify hot areas that may not be randomized and
> > either
> > leave them in the .text section or group them together into a
> > single section
> > that may be randomized. If grouping things together helps, one
> > other thing to
> > consider is that if we could identify text blobs that should be
> > grouped together
> > to benefit a particular code flow, it could be interesting to
> > explore
> > whether this security feature could be also be used as a
> > performance
> > feature if you are interested in optimizing your kernel layout for
> > a
> > particular workload at boot time. Optimizing function layout for a
> > particular
> > workload has been researched and proven effective - for more
> > information
> > read the Facebook paper "Optimizing Function Placement for Large-
> > Scale
> > Data-Center Applications" (see references section below).
> >
> > Image Size
> > ----------
> > Adding additional section headers as a result of compiling with
> > -ffunction-sections will increase the size of the vmlinux ELF file.
> > With a standard distro config, the resulting vmlinux was increased
> > by
> > about 3%. The compressed image is also increased due to the header
> > files,
> > as well as the extra relocations that must be added. You can expect
> > fgkaslr
> > to increase the size of the compressed image by about 15%.
> >
> > Memory Usage
> > ------------
> > fgkaslr increases the amount of heap that is required at boot time,
> > although this extra memory is released when the kernel has finished
> > decompression. As a result, it may not be appropriate to use this
> > feature on
> > systems without much memory.
> >
> > Building
> > --------
> > To enable fine grained KASLR, you need to have the following config
> > options
> > set (including all the ones you would use to build normal KASLR)
> >
> > CONFIG_FG_KASLR=y
> >
> > In addition, fgkaslr is only supported for the X86_64 architecture.
> >
> > Modules
> > -------
> > Modules are randomized similarly to the rest of the kernel by
> > shuffling
> > the sections at load time prior to moving them into memory. The
> > module must
> > also have been build with the -ffunction-sections compiler option.
> >
> > Although fgkaslr for the kernel is only supported for the X86_64
> > architecture,
> > it is possible to use fgkaslr with modules on other architectures.
> > To enable
> > this feature, select
> >
> > CONFIG_MODULE_FG_KASLR=y
> >
> > This option is selected automatically for X86_64 when
> > CONFIG_FG_KASLR is set.
> >
> > Disabling
> > ---------
> > Disabling normal KASLR using the nokaslr command line option also
> > disables
> > fgkaslr. It is also possible to disable fgkaslr separately by
> > booting with
> > fgkaslr=off on the commandline.
> >
> > References
> > ----------
> > There are a lot of academic papers which explore finer grained
> > ASLR.
> > This paper in particular contributed the most to my implementation
> > design
> > as well as my overall understanding of the problem space:
> >
> > Selfrando: Securing the Tor Browser against De-anonymization
> > Exploits,
> > M. Conti, S. Crane, T. Frassetto, et al.
> >
> > For more information on how function layout impacts performance,
> > see:
> >
> > Optimizing Function Placement for Large-Scale Data-Center
> > Applications,
> > G. Ottoni, B. Maher
> >
> > Kees Cook (2):
> > x86/boot: Allow a "silent" kaslr random byte fetch
> > x86/boot/compressed: Avoid duplicate malloc() implementations
> >
> > Kristen Carlson Accardi (8):
> > objtool: Do not assume order of parent/child functions
> > x86: tools/relocs: Support >64K section headers
> > x86: Makefile: Add build and config option for CONFIG_FG_KASLR
> > x86: Make sure _etext includes function sections
> > x86/tools: Add relative relocs for randomized functions
> > x86: Add support for function granular KASLR
> > kallsyms: Hide layout
> > module: Reorder functions
> >
> > .../admin-guide/kernel-parameters.txt | 7 +
> > Documentation/security/fgkaslr.rst | 172 ++++
> > Documentation/security/index.rst | 1 +
> > Makefile | 6 +-
> > arch/x86/Kconfig | 4 +
> > arch/x86/Makefile | 5 +
> > arch/x86/boot/compressed/Makefile | 9 +-
> > arch/x86/boot/compressed/fgkaslr.c | 811
> > ++++++++++++++++++
> > arch/x86/boot/compressed/kaslr.c | 4 -
> > arch/x86/boot/compressed/misc.c | 157 +++-
> > arch/x86/boot/compressed/misc.h | 30 +
> > arch/x86/boot/compressed/utils.c | 11 +
> > arch/x86/boot/compressed/vmlinux.symbols | 17 +
> > arch/x86/include/asm/boot.h | 15 +-
> > arch/x86/kernel/vmlinux.lds.S | 17 +-
> > arch/x86/lib/kaslr.c | 18 +-
> > arch/x86/tools/relocs.c | 143 ++-
> > arch/x86/tools/relocs.h | 4 +-
> > arch/x86/tools/relocs_common.c | 15 +-
> > include/asm-generic/vmlinux.lds.h | 18 +-
> > include/linux/decompress/mm.h | 12 +-
> > include/uapi/linux/elf.h | 1 +
> > init/Kconfig | 26 +
> > kernel/kallsyms.c | 163 +++-
> > kernel/module.c | 81 ++
> > tools/objtool/elf.c | 8 +-
> > 26 files changed, 1670 insertions(+), 85 deletions(-)
> > create mode 100644 Documentation/security/fgkaslr.rst
> > create mode 100644 arch/x86/boot/compressed/fgkaslr.c
> > create mode 100644 arch/x86/boot/compressed/utils.c
> > create mode 100644 arch/x86/boot/compressed/vmlinux.symbols
> >
> >
> > base-commit: 11ba468877bb23f28956a35e896356252d63c983
> > --
> > 2.20.1
> >
>
> Apologies in advance if this has already been discussed elsewhere,
> but I
> did finally get around to testing the patchset against the
> livepatching
> kselftests.
>
> The livepatching kselftests fail as all livepatches stall their
> transitions. It appears that reliable (ORC) stack unwinding is
> broken
> when fgkaslr is enabled.
>
> Relevant config options:
>
> CONFIG_ARCH_HAS_FG_KASLR=y
> CONFIG_ARCH_STACKWALK=y
> CONFIG_FG_KASLR=y
> CONFIG_HAVE_LIVEPATCH=y
> CONFIG_HAVE_RELIABLE_STACKTRACE=y
> CONFIG_LIVEPATCH=y
> CONFIG_MODULE_FG_KASLR=y
> CONFIG_TEST_LIVEPATCH=m
> CONFIG_UNWINDER_ORC=y
>
> The livepatch transitions are stuck along this call path:
>
> klp_check_stack
> stack_trace_save_tsk_reliable
> arch_stack_walk_reliable
>
> /* Check for stack corruption */
> if (unwind_error(&state))
> return -EINVAL;
>
> where the unwinder error is set by unwind_next_frame():
>
> arch/x86/kernel/unwind_orc.c
> bool unwind_next_frame(struct unwind_state *state)
>
> sometimes here:
>
> /* End-of-stack check for kernel threads: */
> if (orc->sp_reg == ORC_REG_UNDEFINED) {
> if (!orc->end)
> goto err;
>
> goto the_end;
> }
>
> or here:
>
> /* Prevent a recursive loop due to bad ORC data:
> */
>
> if (state->stack_info.type == prev_type
> &&
>
> on_stack(&state->stack_info, (void *)state->sp,
> sizeof(long))
> &&
> state->sp <= prev_sp)
> {
>
> orc_warn_current("stack going in the wrong direction?
> at %pB\n",
> (void
> *)orig_ip);
>
> goto
> err;
>
> }
>
> (and probably other places the ORC unwinder gets confused.)
>
>
> It also manifests itself in other, more visible ways. For example, a
> kernel module that calls dump_stack() in its init function or even
> /proc/<pid>/stack:
>
> (fgkaslr on)
> ------------
>
> Call Trace:
> ? dump_stack+0x57/0x73
> ? 0xffffffffc0850000
> ? mymodule_init+0xa/0x1000 [dumpstack]
> ? do_one_initcall+0x46/0x1f0
> ? free_unref_page_commit+0x91/0x100
> ? _cond_resched+0x15/0x30
> ? kmem_cache_alloc_trace+0x14b/0x210
> ? do_init_module+0x5a/0x220
> ? load_module+0x1912/0x1b20
> ? __do_sys_finit_module+0xa8/0x110
> ? __do_sys_finit_module+0xa8/0x110
> ? do_syscall_64+0x47/0x80
> ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> % sudo cat /proc/$$/stack
> [<0>] do_wait+0x1c3/0x230
> [<0>] kernel_wait4+0xa6/0x140
>
>
> fgkaslr=off
> -----------
>
> Call Trace:
> dump_stack+0x57/0x73
> ? 0xffffffffc04f2000
> mymodule_init+0xa/0x1000 [readonly]
> do_one_initcall+0x46/0x1f0
> ? free_unref_page_commit+0x91/0x100
> ? _cond_resched+0x15/0x30
> ? kmem_cache_alloc_trace+0x14b/0x210
> do_init_module+0x5a/0x220
> load_module+0x1912/0x1b20
> ? __do_sys_finit_module+0xa8/0x110
> __do_sys_finit_module+0xa8/0x110
> do_syscall_64+0x47/0x80
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> % sudo cat /proc/$$/stack
> [<0>] do_wait+0x1c3/0x230
> [<0>] kernel_wait4+0xa6/0x140
> [<0>] __do_sys_wait4+0x83/0x90
> [<0>] do_syscall_64+0x47/0x80
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
>
> I would think fixing and verifying these latter cases would be easier
> than
> chasing livepatch transitions (but would still probably fix klp case,
> too).
> Perhaps Josh or someone has other ORC unwinder tests that could be
> used?
>
> -- Joe
>

Hi Joe,
Thanks for testing. Yes, Josh and I have been discussing the orc_unwind
issues. I've root caused one issue already, in that objtool places an
orc_unwind_ip address just outside the section, so my algorithm fails
to relocate this address. There are other issues as well that I still
haven't root caused. I'll be addressing this in v5 and plan to have
something that passes livepatch testing with that version.

Kristen