Re: [PATCH v2] mm: SLAB freelist randomization

From: Thomas Garnier
Date: Tue Apr 19 2016 - 12:45:01 EST


On Tue, Apr 19, 2016 at 12:15 AM, Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> wrote:
> On Mon, Apr 18, 2016 at 10:14:39AM -0700, Thomas Garnier wrote:
>> Provides an optional config (CONFIG_FREELIST_RANDOM) to randomize the
>> SLAB freelist. The list is randomized during initialization of a new set
>> of pages. The order on different freelist sizes is pre-computed at boot
>> for performance. This security feature reduces the predictability of the
>> kernel SLAB allocator against heap overflows rendering attacks much less
>> stable.
>
> I'm not familiar on security but it doesn't look much secure than
> before. Is there any other way to generate different sequence of freelist
> for each new set of pages? Current approach using pre-computed array will
> generate same sequence of freelist for all new set of pages having same size
> class. Is it sufficient?
>

I think it is sufficient. There is a tradeoff for performance. We could randomly
pick an object from the freelist every time (on slab_get_obj) but I
think it will
have significant impact (at least 3%).

>> For example this attack against SLUB (also applicable against SLAB)
>> would be affected:
>> https://jon.oberheide.org/blog/2010/09/10/linux-kernel-can-slub-overflow/
>>
>> Also, since v4.6 the freelist was moved at the end of the SLAB. It means
>> a controllable heap is opened to new attacks not yet publicly discussed.
>> A kernel heap overflow can be transformed to multiple use-after-free.
>> This feature makes this type of attack harder too.
>>
>> To generate entropy, we use get_random_bytes_arch because 0 bits of
>> entropy is available at that boot stage. In the worse case this function
>> will fallback to the get_random_bytes sub API.
>>
>> The config option name is not specific to the SLAB as this approach will
>> be extended to other allocators like SLUB.
>
> If this feature will be applied to the SLUB, it's better to put common
> code to mm/slab_common.c.
>

I think it might be moved there once we implement the SLUB counterpart
but it is too early to define which part will be common.

>>
>> Performance results highlighted no major changes:
>>
>> Netperf average on 10 runs:
>>
>> threads,base,change
>> 16,576943.10,585905.90 (101.55%)
>> 32,564082.00,569741.20 (101.00%)
>> 48,558334.30,561851.20 (100.63%)
>> 64,552025.20,556448.30 (100.80%)
>> 80,552294.40,551743.10 (99.90%)
>> 96,552435.30,547529.20 (99.11%)
>> 112,551320.60,550183.20 (99.79%)
>> 128,549138.30,550542.70 (100.26%)
>> 144,549344.50,544529.10 (99.12%)
>> 160,550360.80,539929.30 (98.10%)
>>
>> slab_test 1 run on boot. After is faster except for odd result on size
>> 2048.
>
> Hmm... It's odd result. It adds more logic and it should
> decrease performance. I guess it would be experimental error but
> do you have any analysis about this result?
>

I don't. I am glad to redo the test. I found that slab_test has very different
result based on the heap state at the time of the test. If I run the
test multiple
times, I have really various results on with or without the mitigation (on
dedicated hardware).

>>
>> Before:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 10000 times kmalloc(8) -> 137 cycles kfree -> 126 cycles
>> 10000 times kmalloc(16) -> 118 cycles kfree -> 119 cycles
>> 10000 times kmalloc(32) -> 112 cycles kfree -> 119 cycles
>> 10000 times kmalloc(64) -> 126 cycles kfree -> 123 cycles
>> 10000 times kmalloc(128) -> 135 cycles kfree -> 131 cycles
>> 10000 times kmalloc(256) -> 165 cycles kfree -> 104 cycles
>> 10000 times kmalloc(512) -> 174 cycles kfree -> 126 cycles
>> 10000 times kmalloc(1024) -> 242 cycles kfree -> 160 cycles
>> 10000 times kmalloc(2048) -> 478 cycles kfree -> 239 cycles
>> 10000 times kmalloc(4096) -> 747 cycles kfree -> 364 cycles
>> 10000 times kmalloc(8192) -> 774 cycles kfree -> 404 cycles
>> 10000 times kmalloc(16384) -> 849 cycles kfree -> 430 cycles
>> 2. Kmalloc: alloc/free test
>> 10000 times kmalloc(8)/kfree -> 118 cycles
>> 10000 times kmalloc(16)/kfree -> 118 cycles
>> 10000 times kmalloc(32)/kfree -> 118 cycles
>> 10000 times kmalloc(64)/kfree -> 121 cycles
>> 10000 times kmalloc(128)/kfree -> 118 cycles
>> 10000 times kmalloc(256)/kfree -> 115 cycles
>> 10000 times kmalloc(512)/kfree -> 115 cycles
>> 10000 times kmalloc(1024)/kfree -> 115 cycles
>> 10000 times kmalloc(2048)/kfree -> 115 cycles
>> 10000 times kmalloc(4096)/kfree -> 115 cycles
>> 10000 times kmalloc(8192)/kfree -> 115 cycles
>> 10000 times kmalloc(16384)/kfree -> 115 cycles
>>
>> After:
>>
>> Single thread testing
>> =====================
>> 1. Kmalloc: Repeatedly allocate then free test
>> 10000 times kmalloc(8) -> 99 cycles kfree -> 84 cycles
>> 10000 times kmalloc(16) -> 88 cycles kfree -> 83 cycles
>> 10000 times kmalloc(32) -> 90 cycles kfree -> 81 cycles
>> 10000 times kmalloc(64) -> 107 cycles kfree -> 97 cycles
>> 10000 times kmalloc(128) -> 134 cycles kfree -> 89 cycles
>> 10000 times kmalloc(256) -> 145 cycles kfree -> 97 cycles
>> 10000 times kmalloc(512) -> 177 cycles kfree -> 116 cycles
>> 10000 times kmalloc(1024) -> 223 cycles kfree -> 151 cycles
>> 10000 times kmalloc(2048) -> 1429 cycles kfree -> 221 cycles
>> 10000 times kmalloc(4096) -> 720 cycles kfree -> 348 cycles
>> 10000 times kmalloc(8192) -> 788 cycles kfree -> 393 cycles
>> 10000 times kmalloc(16384) -> 867 cycles kfree -> 433 cycles
>> 2. Kmalloc: alloc/free test
>> 10000 times kmalloc(8)/kfree -> 115 cycles
>> 10000 times kmalloc(16)/kfree -> 115 cycles
>> 10000 times kmalloc(32)/kfree -> 115 cycles
>> 10000 times kmalloc(64)/kfree -> 120 cycles
>> 10000 times kmalloc(128)/kfree -> 127 cycles
>> 10000 times kmalloc(256)/kfree -> 119 cycles
>> 10000 times kmalloc(512)/kfree -> 112 cycles
>> 10000 times kmalloc(1024)/kfree -> 112 cycles
>> 10000 times kmalloc(2048)/kfree -> 112 cycles
>> 10000 times kmalloc(4096)/kfree -> 112 cycles
>> 10000 times kmalloc(8192)/kfree -> 112 cycles
>> 10000 times kmalloc(16384)/kfree -> 112 cycles
>>
>> Signed-off-by: Thomas Garnier <thgarnie@xxxxxxxxxx>
>> ---
>> Based on next-20160418
>> ---
>> init/Kconfig | 9 ++++
>> mm/slab.c | 166 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 174 insertions(+), 1 deletion(-)
>>
>> diff --git a/init/Kconfig b/init/Kconfig
>> index 0dfd09d..ee35418 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -1742,6 +1742,15 @@ config SLOB
>>
>> endchoice
>>
>> +config FREELIST_RANDOM
>> + default n
>> + depends on SLAB
>> + bool "SLAB freelist randomization"
>> + help
>> + Randomizes the freelist order used on creating new SLABs. This
>> + security feature reduces the predictability of the kernel slab
>> + allocator against heap overflows.
>> +
>> config SLUB_CPU_PARTIAL
>> default y
>> depends on SLUB && SMP
>> diff --git a/mm/slab.c b/mm/slab.c
>> index b70aabf..8371d80 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -116,6 +116,7 @@
>> #include <linux/kmemcheck.h>
>> #include <linux/memory.h>
>> #include <linux/prefetch.h>
>> +#include <linux/log2.h>
>>
>> #include <net/sock.h>
>>
>> @@ -1229,6 +1230,62 @@ static void __init set_up_node(struct kmem_cache *cachep, int index)
>> }
>> }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +/*
>> + * Master lists are pre-computed random lists
>> + * Lists of different sizes are used to optimize performance on SLABS with
>> + * different object counts.
>> + */
>
> If it is for optimization, it would be one option to have separate
> random list for each kmem_cache. It would consume more memory but it
> would be marginal. And, it provides more un-predictability and it can
> give better performance because we don't need state->type (more, less)
> and special handling related for it.
>

I am not sur because major caches are created early at boot time. We still have
the same entropy problem and we are wasting a bit more memory. It will be faster
on usage though but not sure it will be significant.

>> +static freelist_idx_t master_list_2[2];
>> +static freelist_idx_t master_list_4[4];
>> +static freelist_idx_t master_list_8[8];
>> +static freelist_idx_t master_list_16[16];
>> +static freelist_idx_t master_list_32[32];
>> +static freelist_idx_t master_list_64[64];
>> +static freelist_idx_t master_list_128[128];
>> +static freelist_idx_t master_list_256[256];
>> +const static struct m_list {
>> + size_t count;
>> + freelist_idx_t *list;
>> +} master_lists[] = {
>> + { ARRAY_SIZE(master_list_2), master_list_2 },
>> + { ARRAY_SIZE(master_list_4), master_list_4 },
>> + { ARRAY_SIZE(master_list_8), master_list_8 },
>> + { ARRAY_SIZE(master_list_16), master_list_16 },
>> + { ARRAY_SIZE(master_list_32), master_list_32 },
>> + { ARRAY_SIZE(master_list_64), master_list_64 },
>> + { ARRAY_SIZE(master_list_128), master_list_128 },
>> + { ARRAY_SIZE(master_list_256), master_list_256 },
>> +};
>> +
>> +/* Pre-compute the Freelist master lists at boot */
>> +static void __init freelist_random_init(void)
>> +{
>> + unsigned int seed;
>> + size_t z, i, rand;
>> + struct rnd_state slab_rand;
>> +
>> + get_random_bytes_arch(&seed, sizeof(seed));
>> + prandom_seed_state(&slab_rand, seed);
>> +
>> + for (z = 0; z < ARRAY_SIZE(master_lists); z++) {
>> + for (i = 0; i < master_lists[z].count; i++)
>> + master_lists[z].list[i] = i;
>> +
>> + /* Fisher-Yates shuffle */
>> + for (i = master_lists[z].count - 1; i > 0; i--) {
>> + rand = prandom_u32_state(&slab_rand);
>> + rand %= (i + 1);
>> + swap(master_lists[z].list[i],
>> + master_lists[z].list[rand]);
>> + }
>> + }
>> +}
>> +#else
>> +static inline void __init freelist_random_init(void) { }
>> +#endif /* CONFIG_FREELIST_RANDOM */
>> +
>> +
>> /*
>> * Initialisation. Called after the page allocator have been initialised and
>> * before smp_init().
>> @@ -1255,6 +1312,8 @@ void __init kmem_cache_init(void)
>> if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
>> slab_max_order = SLAB_MAX_ORDER_HI;
>>
>> + freelist_random_init();
>> +
>> /* Bootstrap is tricky, because several objects are allocated
>> * from caches that do not exist yet:
>> * 1) initialize the kmem_cache cache: it contains the struct
>> @@ -2442,6 +2501,107 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct page *page)
>> #endif
>> }
>>
>> +#ifdef CONFIG_FREELIST_RANDOM
>> +/* Identify if the target freelist matches the pre-computed list */
>> +enum master_type {
>> + match,
>> + less,
>> + more
>> +};
>> +
>> +/* Hold information during a freelist initialization */
>> +struct freelist_init_state {
>> + unsigned int padding;
>> + unsigned int pos;
>> + unsigned int count;
>> + struct m_list master_list;
>> + unsigned int master_count;
>> + enum master_type type;
>> +};
>> +
>> +/* Select the right pre-computed master list and initialize state */
>> +static void freelist_state_initialize(struct freelist_init_state *state,
>> + unsigned int count)
>> +{
>> + unsigned int idx;
>> + const unsigned int last_idx = ARRAY_SIZE(master_lists) - 1;
>> +
>> + memset(state, 0, sizeof(*state));
>> + state->count = count;
>> + state->pos = 0;
>
> Using pos = 0 here looks not good in terms of security. In this case,
> every new page having same size class have same sequence of freelist since boot.
>
> How about using random value to set pos? It provides some more randomness
> with minimal overhead.
>

I think it is a good idea. I will add that for the next iteration.

>> + /* count is always >= 2 */
>> + idx = ilog2(count) - 1;
>> + if (idx >= last_idx)
>> + idx = last_idx;
>> + else if (roundup_pow_of_two(idx + 1) != count)
>> + idx++;
>> + state->master_list = master_lists[idx];
>> + if (state->master_list.count == state->count)
>> + state->type = match;
>> + else if (state->master_list.count > state->count)
>> + state->type = more;
>> + else
>> + state->type = less;
>> +}
>> +
>> +/* Get the next entry on the master list depending on the target list size */
>> +static freelist_idx_t get_next_entry(struct freelist_init_state *state)
>> +{
>> + if (state->type == less && state->pos == state->master_list.count) {
>> + state->padding += state->pos;
>> + state->pos = 0;
>> + }
>> + BUG_ON(state->pos >= state->master_list.count);
>> + return state->master_list.list[state->pos++];
>> +}
>> +
>> +static freelist_idx_t next_random_slot(struct freelist_init_state *state)
>> +{
>> + freelist_idx_t cur, entry;
>> +
>> + entry = get_next_entry(state);
>> +
>> + if (state->type != match) {
>> + while ((entry + state->padding) >= state->count)
>> + entry = get_next_entry(state);
>> + cur = entry + state->padding;
>> + BUG_ON(cur >= state->count);
>> + } else {
>> + cur = entry;
>> + }
>> +
>> + return cur;
>> +}
>> +
>> +/* Shuffle the freelist initialization state based on pre-computed lists */
>> +static void shuffle_freelist(struct kmem_cache *cachep, struct page *page,
>> + unsigned int count)
>> +{
>> + unsigned int i;
>> + struct freelist_init_state state;
>> +
>> + if (count < 2) {
>> + for (i = 0; i < count; i++)
>> + set_free_obj(page, i, i);
>> + return;
>> + }
>> +
>> + /* Last chunk is used already in this case */
>> + if (OBJFREELIST_SLAB(cachep))
>> + count--;
>> +
>> + freelist_state_initialize(&state, count);
>> + for (i = 0; i < count; i++)
>> + set_free_obj(page, i, next_random_slot(&state));
>> +
>> + if (OBJFREELIST_SLAB(cachep))
>> + set_free_obj(page, i, i);
>
> Please consider last object of OBJFREELIST_SLAB cache, too.
>
> freelist_state_init()
> last_obj = next_randome_slot()
> page->freelist = XXX
> for (i = 0; i < count - 1; i++)
> set_free_obj()
> set_free_obj(last_obj);
>
> Thanks.
>

The current implementation take the last chunk by default before the
freelist is initialized. Do you want it to be randomized as well?

>> +}
>> +#else
>> +static inline void shuffle_freelist(struct kmem_cache *cachep,
>> + struct page *page, unsigned int count) { }
>> +#endif /* CONFIG_FREELIST_RANDOM */
>> +
>> static void cache_init_objs(struct kmem_cache *cachep,
>> struct page *page)
>> {
>> @@ -2464,8 +2624,12 @@ static void cache_init_objs(struct kmem_cache *cachep,
>> kasan_poison_object_data(cachep, objp);
>> }
>>
>> - set_free_obj(page, i, i);
>> + /* If enabled, initialization is done in shuffle_freelist */
>> + if (!config_enabled(CONFIG_FREELIST_RANDOM))
>> + set_free_obj(page, i, i);
>> }
>> +
>> + shuffle_freelist(cachep, page, cachep->num);
>> }
>>
>> static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>