Re: [PATCH v3 0/6] slab: Introduce dedicated bucket allocator

From: jvoisin
Date: Sun Apr 28 2024 - 07:03:04 EST


On 4/24/24 23:40, Kees Cook wrote:
> Hi,
>
> Series change history:
>
> v3:
> - clarify rationale and purpose in commit log
> - rebase to -next (CONFIG_CODE_TAGGING)
> - simplify calling styles and split out bucket plumbing more cleanly
> - consolidate kmem_buckets_*() family introduction patches
> v2: https://lore.kernel.org/lkml/20240305100933.it.923-kees@xxxxxxxxxx/
> v1: https://lore.kernel.org/lkml/20240304184252.work.496-kees@xxxxxxxxxx/
>
> For the cover letter, I'm repeating commit log for patch 4 here, which has
> additional clarifications and rationale since v2:
>
> Dedicated caches are available for fixed size allocations via
> kmem_cache_alloc(), but for dynamically sized allocations there is only
> the global kmalloc API's set of buckets available. This means it isn't
> possible to separate specific sets of dynamically sized allocations into
> a separate collection of caches.
>
> This leads to a use-after-free exploitation weakness in the Linux
> kernel since many heap memory spraying/grooming attacks depend on using
> userspace-controllable dynamically sized allocations to collide with
> fixed size allocations that end up in same cache.
>
> While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
> against these kinds of "type confusion" attacks, including for fixed
> same-size heap objects, we can create a complementary deterministic
> defense for dynamically sized allocations that are directly user
> controlled. Addressing these cases is limited in scope, so isolation these
> kinds of interfaces will not become an unbounded game of whack-a-mole. For
> example, pass through memdup_user(), making isolation there very
> effective.

What does "Addressing these cases is limited in scope, so isolation
these kinds of interfaces will not become an unbounded game of
whack-a-mole." mean exactly?

>
> In order to isolate user-controllable sized allocations from system
> allocations, introduce kmem_buckets_create(), which behaves like
> kmem_cache_create(). Introduce kmem_buckets_alloc(), which behaves like
> kmem_cache_alloc(). Introduce kmem_buckets_alloc_track_caller() for
> where caller tracking is needed. Introduce kmem_buckets_valloc() for
> cases where vmalloc callback is needed.
>
> Allows for confining allocations to a dedicated set of sized caches
> (which have the same layout as the kmalloc caches).
>
> This can also be used in the future to extend codetag allocation
> annotations to implement per-caller allocation cache isolation[1] even
> for dynamic allocations.
Having per-caller allocation cache isolation looks like something that
has already been done in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c6152940584290668b35fa0800026f6a1ae05fe
albeit in a randomized way. Why not piggy-back on the infra added by
this patch, instead of adding a new API?

> Memory allocation pinning[2] is still needed to plug the Use-After-Free
> cross-allocator weakness, but that is an existing and separate issue
> which is complementary to this improvement. Development continues for
> that feature via the SLAB_VIRTUAL[3] series (which could also provide
> guard pages -- another complementary improvement).
>
> Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
> Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
> Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@xxxxxxxxxx/ [3]

To be honest, I think this series is close to useless without allocation
pinning. And even with pinning, it's still routinely bypassed in the
KernelCTF
(https://github.com/google/security-research/tree/master/pocs/linux/kernelctf).

Do you have some particular exploits in mind that would be completely
mitigated by your series?

Moreover, I'm not aware of any ongoing development of the SLAB_VIRTUAL
series: the last sign of life on its thread is from 7 months ago.

>
> After the core implementation are 2 patches that cover the most heavily
> abused "repeat offenders" used in exploits. Repeating those details here:
>
> The msg subsystem is a common target for exploiting[1][2][3][4][5][6]
> use-after-free type confusion flaws in the kernel for both read and
> write primitives. Avoid having a user-controlled size cache share the
> global kmalloc allocator by using a separate set of kmalloc buckets.
>
> Link: https://blog.hacktivesecurity.com/index.php/2022/06/13/linux-kernel-exploit-development-1day-case-study/ [1]
> Link: https://hardenedvault.net/blog/2022-11-13-msg_msg-recon-mitigation-ved/ [2]
> Link: https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html [3]
> Link: https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html [4]
> Link: https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html [5]
> Link: https://zplin.me/papers/ELOISE.pdf [6]
> Link: https://syst3mfailure.io/wall-of-perdition/ [7]
>
> Both memdup_user() and vmemdup_user() handle allocations that are
> regularly used for exploiting use-after-free type confusion flaws in
> the kernel (e.g. prctl() PR_SET_VMA_ANON_NAME[1] and setxattr[2][3][4]
> respectively).
>
> Since both are designed for contents coming from userspace, it allows
> for userspace-controlled allocation sizes. Use a dedicated set of kmalloc
> buckets so these allocations do not share caches with the global kmalloc
> buckets.
>
> Link: https://starlabs.sg/blog/2023/07-prctl-anon_vma_name-an-amusing-heap-spray/ [1]
> Link: https://duasynt.com/blog/linux-kernel-heap-spray [2]
> Link: https://etenal.me/archives/1336 [3]
> Link: https://github.com/a13xp0p0v/kernel-hack-drill/blob/master/drill_exploit_uaf.c [4]

What's the performance impact of this series? Did you run some benchmarks?

>
> Thanks!
>
> -Kees
>
>
> Kees Cook (6):
> mm/slab: Introduce kmem_buckets typedef
> mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
> mm/slab: Introduce __kvmalloc_node() that can take kmem_buckets
> argument
> mm/slab: Introduce kmem_buckets_create() and family
> ipc, msg: Use dedicated slab buckets for alloc_msg()
> mm/util: Use dedicated slab buckets for memdup_user()
>
> include/linux/slab.h | 44 ++++++++++++++++--------
> ipc/msgutil.c | 13 +++++++-
> lib/fortify_kunit.c | 2 +-
> lib/rhashtable.c | 2 +-
> mm/slab.h | 6 ++--
> mm/slab_common.c | 79 +++++++++++++++++++++++++++++++++++++++++---
> mm/slub.c | 14 ++++----
> mm/util.c | 21 +++++++++---
> 8 files changed, 146 insertions(+), 35 deletions(-)
>