Re: [PATCH RFC 00/10] KFENCE: A low-overhead sampling-based memory safety error detector

From: Marco Elver
Date: Tue Sep 08 2020 - 15:20:37 EST

Next message: Jason Gunthorpe: "Re: [PATCH rdma-next 4/4] RDMA/umem: Move to allocate SG table from pages"
Previous message: Bjorn Andersson: "Re: [PATCH 6/7] cpufreq: qcom-hw: Add cpufreq support for SM8250 SoC"
In reply to: Marco Elver: "Re: [PATCH RFC 00/10] KFENCE: A low-overhead sampling-based memory safety error detector"
Next in thread: Vlastimil Babka: "Re: [PATCH RFC 00/10] KFENCE: A low-overhead sampling-based memory safety error detector"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Sep 08, 2020 at 07:52AM -0700, Dave Hansen wrote:
> On 9/7/20 6:40 AM, Marco Elver wrote:
> > KFENCE is designed to be enabled in production kernels, and has near
> > zero performance overhead. Compared to KASAN, KFENCE trades performance
> > for precision.
>
> Could you talk a little bit about where you expect folks to continue to
> use KASAN? How would a developer or a tester choose which one to use?

We mention some of this in Documentation/dev-tools/kfence.rst:

In the kernel, several tools exist to debug memory access errors, and in
particular KASAN can detect all bug classes that KFENCE can detect. While KASAN
is more precise, relying on compiler instrumentation, this comes at a
performance cost. We want to highlight that KASAN and KFENCE are complementary,
with different target environments. For instance, KASAN is the better
debugging-aid, where a simple reproducer exists: due to the lower chance to
detect the error, it would require more effort using KFENCE to debug.
Deployments at scale, however, would benefit from using KFENCE to discover bugs
due to code paths not exercised by test cases or fuzzers.

If you can afford to use KASAN, continue using KASAN. Usually this only
applies to test environments. If you have kernels for production use,
and cannot enable KASAN for the obvious cost reasons, you could consider
KFENCE.

I'll try to make this clearer, maybe summarizing what I said here in
Documentation as well.

> > KFENCE objects each reside on a dedicated page, at either the left or
> > right page boundaries. The pages to the left and right of the object
> > page are "guard pages", whose attributes are changed to a protected
> > state, and cause page faults on any attempted access to them. Such page
> > faults are then intercepted by KFENCE, which handles the fault
> > gracefully by reporting a memory access error.
>
> How much memory overhead does this end up having? I know it depends on
> the object size and so forth. But, could you give some real-world
> examples of memory consumption? Also, what's the worst case? Say I
> have a ton of worst-case-sized (32b) slab objects. Will I notice?

KFENCE objects are limited (default 255). If we exhaust KFENCE's memory
pool, no more KFENCE allocations will occur.
Documentation/dev-tools/kfence.rst gives a formula to calculate the
KFENCE pool size:

The total memory dedicated to the KFENCE memory pool can be computed as::

( #objects + 1 ) * 2 * PAGE_SIZE

Using the default config, and assuming a page size of 4 KiB, results in
dedicating 2 MiB to the KFENCE memory pool.

Does that clarify this point? Or anything else that could help clarify
this?

Thanks,
-- Marco

Next message: Jason Gunthorpe: "Re: [PATCH rdma-next 4/4] RDMA/umem: Move to allocate SG table from pages"
Previous message: Bjorn Andersson: "Re: [PATCH 6/7] cpufreq: qcom-hw: Add cpufreq support for SM8250 SoC"
In reply to: Marco Elver: "Re: [PATCH RFC 00/10] KFENCE: A low-overhead sampling-based memory safety error detector"
Next in thread: Vlastimil Babka: "Re: [PATCH RFC 00/10] KFENCE: A low-overhead sampling-based memory safety error detector"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]