Re: [PATCH v3 01/10] mm: add Kernel Electric-Fence infrastructure

From: Mark Rutland
Date: Thu Oct 01 2020 - 14:11:32 EST


On Tue, Sep 29, 2020 at 05:51:58PM +0200, Alexander Potapenko wrote:
> On Tue, Sep 29, 2020 at 4:24 PM Mark Rutland <mark.rutland@xxxxxxx> wrote:
> >
> > On Mon, Sep 21, 2020 at 03:26:02PM +0200, Marco Elver wrote:
> > > From: Alexander Potapenko <glider@xxxxxxxxxx>
> > >
> > > This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> > > low-overhead sampling-based memory safety error detector of heap
> > > use-after-free, invalid-free, and out-of-bounds access errors.
> > >
> > > KFENCE is designed to be enabled in production kernels, and has near
> > > zero performance overhead. Compared to KASAN, KFENCE trades performance
> > > for precision. The main motivation behind KFENCE's design, is that with
> > > enough total uptime KFENCE will detect bugs in code paths not typically
> > > exercised by non-production test workloads. One way to quickly achieve a
> > > large enough total uptime is when the tool is deployed across a large
> > > fleet of machines.
> > >
> > > KFENCE objects each reside on a dedicated page, at either the left or
> > > right page boundaries. The pages to the left and right of the object
> > > page are "guard pages", whose attributes are changed to a protected
> > > state, and cause page faults on any attempted access to them. Such page
> > > faults are then intercepted by KFENCE, which handles the fault
> > > gracefully by reporting a memory access error. To detect out-of-bounds
> > > writes to memory within the object's page itself, KFENCE also uses
> > > pattern-based redzones. The following figure illustrates the page
> > > layout:
> > >
> > > ---+-----------+-----------+-----------+-----------+-----------+---
> > > | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx |
> > > | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx |
> > > | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x |
> > > | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx |
> > > | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx |
> > > | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx |
> > > ---+-----------+-----------+-----------+-----------+-----------+---
> > >
> > > Guarded allocations are set up based on a sample interval (can be set
> > > via kfence.sample_interval). After expiration of the sample interval, a
> > > guarded allocation from the KFENCE object pool is returned to the main
> > > allocator (SLAB or SLUB). At this point, the timer is reset, and the
> > > next allocation is set up after the expiration of the interval.
> >
> > From other sub-threads it sounds like these addresses are not part of
> > the linear/direct map.
> For x86 these addresses belong to .bss, i.e. "kernel text mapping"
> section, isn't that the linear map?

No; the "linear map" is the "direct mapping" on x86, and the "image" or
"kernel text mapping" is a distinct VA region. The image mapping aliases
(i.e. uses the same physical pages as) a portion of the linear map, and
every page in the linear map has a struct page.

Fon the x86_64 ivirtual memory layout, see:

https://www.kernel.org/doc/html/latest/x86/x86_64/mm.html

Originally, the kernel image lived in the linear map, but it was split
out into a distinct VA range (among other things) to permit KASLR. When
that split was made, the x86 virt_to_*() helpers were updated to detect
when they were passed a kernel image address, and automatically fix that
up as-if they'd been handed the linear map alias of that address.

For going one-way from virt->{phys,page} that works ok, but it doesn't
survive the round-trip, and introduces redundant work into each
virt_to_*() call.

As it was largely arch code that was using image addresses, we didn't
bother with the fixup on arm64, as we preferred the stronger warning. At
the time I was also under the impression that on x86 they wanted to get
rid of the automatic fixup, but that doesn't seem to have happened.

Thanks,
Mark.