Re: [PATCH v2 1/3] mm: Shuffle initial free memory

From: Dan Williams
Date: Thu Oct 04 2018 - 13:14:51 EST

Next message: Alexey Budankov: "Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)"
Previous message: Steve Sakoman: "Re: linux-next: build failure after merge of the tty tree"
In reply to: Michal Hocko: "Re: [PATCH v2 1/3] mm: Shuffle initial free memory"
Next in thread: Dan Williams: "[PATCH v2 2/3] mm: Move buddy list manipulations into helpers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Oct 4, 2018 at 12:48 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Wed 03-10-18 19:15:24, Dan Williams wrote:
> > Some data exfiltration and return-oriented-programming attacks rely on
> > the ability to infer the location of sensitive data objects. The kernel
> > page allocator, especially early in system boot, has predictable
> > first-in-first out behavior for physical pages. Pages are freed in
> > physical address order when first onlined.
> >
> > Introduce shuffle_free_memory(), and its helper shuffle_zone(), to
> > perform a Fisher-Yates shuffle of the page allocator 'free_area' lists
> > when they are initially populated with free memory at boot and at
> > hotplug time.
> >
> > Quoting Kees:
> > "While we already have a base-address randomization
> > (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and
> > memory layouts would certainly be using the predictability of
> > allocation ordering (i.e. for attacks where the base address isn't
> > important: only the relative positions between allocated memory).
> > This is common in lots of heap-style attacks. They try to gain
> > control over ordering by spraying allocations, etc.
> >
> > I'd really like to see this because it gives us something similar
> > to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator."
> >
> > Another motivation for this change is performance in the presence of a
> > memory-side cache. In the future, memory-side-cache technology will be
> > available on generally available server platforms. The proposed
> > randomization approach has been measured to improve the cache conflict
> > rate by a factor of 2.5X on a well-known Java benchmark. It avoids
> > performance peaks and valleys to provide more predictable performance.
> >
> > While SLAB_FREELIST_RANDOM reduces the predictability of some local slab
> > caches it leaves vast bulk of memory to be predictably in order
> > allocated. That ordering can be detected by a memory side-cache.
> >
> > The shuffling is done in terms of 'shuffle_page_order' sized free pages
> > where the default shuffle_page_order is MAX_ORDER-1 i.e. 10, 4MB this
> > trades off randomization granularity for time spent shuffling.
> > MAX_ORDER-1 was chosen to be minimally invasive to the page allocator
> > while still showing memory-side cache behavior improvements.
> >
> > The performance impact of the shuffling appears to be in the noise
> > compared to other memory initialization work. Also the bulk of the work
> > is done in the background as a part of deferred_init_memmap().
>
> This is the biggest portion of the series and I am wondering why do we
> need it at all. Why it isn't sufficient to rely on the patch 3 here?

In fact we started with only patch3 and it had no measurable impact on
the cache conflict rate.

> Pages freed from the bootmem allocator go via the same path so they
> might be shuffled at that time. Or is there any problem with that?
> Not enough entropy at the time when this is called or the final result
> is not randomized enough (some numbers would be helpful).

So the reason front-back randomization is not enough is due to the
in-order initial freeing of pages. At the start of that process
putting page1 in front or behind page0 still keeps them close
together, page2 is still near page1 and has a high chance of being
adjacent. As more pages are added ordering diversity improves, but
there is still high page locality for the low address pages and this
leads to no significant impact to the cache conflict rate. Patch3 is
enough to keep the entropy sustained over time, but it's not enough
initially.

Next message: Alexey Budankov: "Re: [RFC 0/5] perf: Per PMU access controls (paranoid setting)"
Previous message: Steve Sakoman: "Re: linux-next: build failure after merge of the tty tree"
In reply to: Michal Hocko: "Re: [PATCH v2 1/3] mm: Shuffle initial free memory"
Next in thread: Dan Williams: "[PATCH v2 2/3] mm: Move buddy list manipulations into helpers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]