Re: [PATCH] mm, kasan: introduce a special shadow value for allocator metadata

From: Alexander Potapenko
Date: Thu Jun 02 2016 - 08:18:46 EST

On Thu, Jun 2, 2016 at 2:17 PM, Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx> wrote:
> On 06/02/2016 03:02 PM, Alexander Potapenko wrote:
>> On Wed, Jun 1, 2016 at 6:31 PM, Alexander Potapenko <glider@xxxxxxxxxx> wrote:
>>> On Wed, Jun 1, 2016 at 5:23 PM, Andrey Ryabinin <aryabinin@xxxxxxxxxxxxx> wrote:
>>>> On 05/31/2016 08:49 PM, Alexander Potapenko wrote:
>>>>> On Tue, May 31, 2016 at 1:52 PM, Andrey Ryabinin
>>>>> <aryabinin@xxxxxxxxxxxxx> wrote:
>>>>>> On 05/31/2016 01:44 PM, Alexander Potapenko wrote:
>>>>>>> Add a special shadow value to distinguish accesses to KASAN-specific
>>>>>>> allocator metadata.
>>>>>>> Unlike AddressSanitizer in the userspace, KASAN lets the kernel proceed
>>>>>>> after a memory error. However a write to the kmalloc metadata may cause
>>>>>>> memory corruptions that will make the tool itself unreliable and induce
>>>>>>> crashes later on. Warning about such corruptions will ease the
>>>>>>> debugging.
>>>>>> It will not. Whether out-of-bounds hits metadata or not is absolutely irrelevant
>>>>>> to the bug itself. This information doesn't help to understand, analyze or fix the bug.
>>>>> Here's the example that made me think the opposite.
>>>>> I've been reworking KASAN hooks for mempool and added a test that did
>>>>> a write-after-free to an object allocated from a mempool.
>>>>> This resulted in flaky kernel crashes somewhere in quarantine
>>>>> shrinking after several attempts to `insmod test_kasan.ko`.
>>>>> Because there already were numerous KASAN errors in the test, it
>>>>> wasn't evident that the crashes were related to the new test, so I
>>>>> thought the problem was in the buggy quarantine implementation.
>>>>> However the problem was indeed in the new test, which corrupted the
>>>>> quarantine pointer in the object and caused a crash while traversing
>>>>> the quarantine list.
>>>>> My previous experience with userspace ASan shows that crashes in the
>>>>> tool code itself puzzle the developers.
>>>>> As a result, the users think that the tool is broken and don't believe
>>>>> its reports.
>>>>> I first thought about hardening the quarantine list by checksumming
>>>>> the pointers and validating them on each traversal.
>>>>> This prevents the crashes, but doesn't give the users any idea about
>>>>> what went wrong.
>>>>> On the other hand, reporting the pointer corruption right when it happens does.
>>>>> Distinguishing between a regular UAF and a quarantine corruption
>>>>> (which is what the patch in question is about) helps to prioritize the
>>>>> KASAN reports and give the developers better understanding of the
>>>>> consequences.
>>>> After the first report we have memory in a corrupted state, so we are done here.
>>> This is theoretically true, that's why we crash after the first report
>>> in the userspace ASan.
>>> But since the kernel proceeds after the first KASAN report, it's
>>> possible that we see several different reports, and they are sometimes
>>> worth looking at.
>>>> Anything that happens after the first report can't be trusted since it can be an after-effect,
>>>> just like in your case. Such crashes are not worthy to look at.
>>>> Out-of-bounds that doesn't hit metadata as any other memory corruption also can lead to after-effects crashes,
>>>> thus distinguishing such bugs doesn't make a lot of sense.
>>> Unlike the crashes in the kernel itself, crashes with KASAN functions
>>> in the stack trace may make the developer think the tool is broken.
>>>> test_kasan module is just a quick hack, made only to make sure that KASAN works.
>>>> It does some crappy thing, and may lead to crash as well. So I would recommend an immediate
>>>> reboot even after single attempt to load it.
>>> Agreed. However a plain write into the first byte of the freed object
>>> will cause similar problems.
>> On a second thought, we could do without the additional shadow byte
>> value, by just comparing the address to the metadata offset.
> We could. But still, there is no point in doing anything like that.
Ok, got it.

Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-StraÃe, 33
80636 MÃnchen

GeschÃftsfÃhrer: Matthew Scott Sucherman, Paul Terence Manicle
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg