Re: [PATCH] mm, kasan: introduce a special shadow value for allocator metadata

From: Alexander Potapenko
Date: Wed Jun 01 2016 - 12:31:37 EST

On Wed, Jun 1, 2016 at 5:23 PM, Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx> wrote:
> On 05/31/2016 08:49 PM, Alexander Potapenko wrote:
>> On Tue, May 31, 2016 at 1:52 PM, Andrey Ryabinin
>> <aryabinin@xxxxxxxxxxxxx> wrote:
>>> On 05/31/2016 01:44 PM, Alexander Potapenko wrote:
>>>> Add a special shadow value to distinguish accesses to KASAN-specific
>>>> allocator metadata.
>>>> Unlike AddressSanitizer in the userspace, KASAN lets the kernel proceed
>>>> after a memory error. However a write to the kmalloc metadata may cause
>>>> memory corruptions that will make the tool itself unreliable and induce
>>>> crashes later on. Warning about such corruptions will ease the
>>>> debugging.
>>> It will not. Whether out-of-bounds hits metadata or not is absolutely irrelevant
>>> to the bug itself. This information doesn't help to understand, analyze or fix the bug.
>> Here's the example that made me think the opposite.
>> I've been reworking KASAN hooks for mempool and added a test that did
>> a write-after-free to an object allocated from a mempool.
>> This resulted in flaky kernel crashes somewhere in quarantine
>> shrinking after several attempts to `insmod test_kasan.ko`.
>> Because there already were numerous KASAN errors in the test, it
>> wasn't evident that the crashes were related to the new test, so I
>> thought the problem was in the buggy quarantine implementation.
>> However the problem was indeed in the new test, which corrupted the
>> quarantine pointer in the object and caused a crash while traversing
>> the quarantine list.
>> My previous experience with userspace ASan shows that crashes in the
>> tool code itself puzzle the developers.
>> As a result, the users think that the tool is broken and don't believe
>> its reports.
>> I first thought about hardening the quarantine list by checksumming
>> the pointers and validating them on each traversal.
>> This prevents the crashes, but doesn't give the users any idea about
>> what went wrong.
>> On the other hand, reporting the pointer corruption right when it happens does.
>> Distinguishing between a regular UAF and a quarantine corruption
>> (which is what the patch in question is about) helps to prioritize the
>> KASAN reports and give the developers better understanding of the
>> consequences.
> After the first report we have memory in a corrupted state, so we are done here.
This is theoretically true, that's why we crash after the first report
in the userspace ASan.
But since the kernel proceeds after the first KASAN report, it's
possible that we see several different reports, and they are sometimes
worth looking at.

> Anything that happens after the first report can't be trusted since it can be an after-effect,
> just like in your case. Such crashes are not worthy to look at.
> Out-of-bounds that doesn't hit metadata as any other memory corruption also can lead to after-effects crashes,
> thus distinguishing such bugs doesn't make a lot of sense.
Unlike the crashes in the kernel itself, crashes with KASAN functions
in the stack trace may make the developer think the tool is broken.
> test_kasan module is just a quick hack, made only to make sure that KASAN works.
> It does some crappy thing, and may lead to crash as well. So I would recommend an immediate
> reboot even after single attempt to load it.
Agreed. However a plain write into the first byte of the freed object
will cause similar problems.

Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-StraÃe, 33
80636 MÃnchen

GeschÃftsfÃhrer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg