Re: [PATCH 0/5] mm: poison critical mm/ structs
From: Sasha Levin
Date: Thu Oct 02 2014 - 10:58:38 EST
On 10/02/2014 05:23 AM, Hugh Dickins wrote:
> On Wed, 1 Oct 2014, Sasha Levin wrote:
>> On 10/01/2014 05:07 PM, Andrew Morton wrote:
>>> On Mon, 29 Sep 2014 21:47:14 -0400 Sasha Levin <sasha.levin@xxxxxxxxxx> wrote:
>>>
>>>> Currently we're seeing a few issues which are unexplainable by looking at the
>>>> data we see and are most likely caused by a memory corruption caused
>>>> elsewhere.
>>>>
>>>> This is wasting time for folks who are trying to figure out an issue provided
>>>> a stack trace that can't really point out the real issue.
>>>>
>>>> This patch introduces poisoning on struct page, vm_area_struct, and mm_struct,
>>>> and places checks in busy paths to catch corruption early.
>>>>
>>>> This series was tested, and it detects corruption in vm_area_struct. Right now
>>>> I'm working on figuring out the source of the corruption, (which is a long
>>>> standing bug) using KASan, but the current code is useful as it is.
>>>
>>> Is this still useful if/when kasan is in place?
>>
>> Yes, the corruption we're seeing happens inside the struct rather than around it.
>> kasan doesn't look there.
>>
>> When kasan is merged, we could complement this patchset by making kasan trap on
>> when the poison is getting written, rather than triggering a BUG in some place
>> else after we saw the corruption.
>>
>>> It looks fairly cheap - I wonder if it should simply fall under
>>> CONFIG_DEBUG_VM rather than the new CONFIG_DEBUG_VM_POISON.
>>
>> Config options are cheap as well :)
>>
>> I'd rather expand it further and add poison/kasan trapping into other places such
>> as the vma interval tree rather than having to keep it "cheap".
>
> I like to run with CONFIG_DEBUG_VM, and would not want this stuff
> turned on in my builds (especially not the struct page enlargement);
> so I'm certainly with you in preferring a separate option.
>
> But it all seems very ad hoc to me. Are people going to be adding
> more and more mm structures into it, ad infinitum? And adding
> CONFIG_DEBUG_SCHED_POISON one day when someone notices corruption
> of a scheduler structure? etc etc.
That was my plan, yes.
> What does this add on top of slab poisoning? Some checks in some
> mm places while the object is active, I guess: why not base those
> on slab poisoning? And add them in as appropriate to the problem
> at hand, when a problem is seen.
The extra you're getting is detecting corruption that happened
inside the object rather than around it. In the case of poisoning
working along with kasan you don't have to limit it to slab either,
so you can detect issues in static objects as well.
fwiw, there's currently a long standing issue with corruption inside
spinlocks in sched code. This sort of issues always exist, so (at least)
my kernel would always have poisoning in some form from now on.
> I think these patches are fine for investigating whatever is the
> problem currently afflicting you and mm under trinity; but we all
> have our temporary debugging patches, I don't think all deserve
> preservation in everyone else's kernel, that amounts to far more
> clutter than any are worth.
If the issue is lines of code we can look into making it cleaner.
> I'm glad to hear they've confirmed some vm_area_struct corruption:
> any ideas on where that's coming from?
Nope, I've added kasan poisoning to vm_area_struct but it has not
reproduced since then, I've just hit bunch of different issues.
Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/