Re: [PATCH v1 00/11] mm/kasan: support per-page shadow memory to reduce memory consumption

From: Vladimir Murzin
Date: Tue May 30 2017 - 05:39:28 EST


On 30/05/17 10:26, Dmitry Vyukov wrote:
> On Tue, May 30, 2017 at 11:08 AM, Vladimir Murzin
> <vladimir.murzin@xxxxxxx> wrote:
>>> <vladimir.murzin@xxxxxxx> wrote:
>>>> On 30/05/17 09:31, Vladimir Murzin wrote:
>>>>> [This sender failed our fraud detection checks and may not be who they appear to be. Learn about spoofing at http://aka.ms/LearnAboutSpoofing]
>>>>>
>>>>> On 30/05/17 09:15, Dmitry Vyukov wrote:
>>>>>> On Tue, May 30, 2017 at 9:58 AM, Vladimir Murzin
>>>>>> <vladimir.murzin@xxxxxxx> wrote:
>>>>>>> On 29/05/17 16:29, Dmitry Vyukov wrote:
>>>>>>>> I have an alternative proposal. It should be conceptually simpler and
>>>>>>>> also less arch-dependent. But I don't know if I miss something
>>>>>>>> important that will render it non working.
>>>>>>>> Namely, we add a pointer to shadow to the page struct. Then, create a
>>>>>>>> slab allocator for 512B shadow blocks. Then, attach/detach these
>>>>>>>> shadow blocks to page structs as necessary. It should lead to even
>>>>>>>> smaller memory consumption because we won't need a whole shadow page
>>>>>>>> when only 1 out of 8 corresponding kernel pages are used (we will need
>>>>>>>> just a single 512B block). I guess with some fragmentation we need
>>>>>>>> lots of excessive shadow with the current proposed patch.
>>>>>>>> This does not depend on TLB in any way and does not require hooking
>>>>>>>> into buddy allocator.
>>>>>>>> The main downside is that we will need to be careful to not assume
>>>>>>>> that shadow is continuous. In particular this means that this mode
>>>>>>>> will work only with outline instrumentation and will need some ifdefs.
>>>>>>>> Also it will be slower due to the additional indirection when
>>>>>>>> accessing shadow, but that's meant as "small but slow" mode as far as
>>>>>>>> I understand.
>>>>>>>>
>>>>>>>> But the main win as I see it is that that's basically complete support
>>>>>>>> for 32-bit arches. People do ask about arm32 support:
>>>>>>>> https://groups.google.com/d/msg/kasan-dev/Sk6BsSPMRRc/Gqh4oD_wAAAJ
>>>>>>>> https://groups.google.com/d/msg/kasan-dev/B22vOFp-QWg/EVJPbrsgAgAJ
>>>>>>>> and probably mips32 is relevant as well.
>>>>>>>> Such mode does not require a huge continuous address space range, has
>>>>>>>> minimal memory consumption and requires minimal arch-dependent code.
>>>>>>>> Works only with outline instrumentation, but I think that's a
>>>>>>>> reasonable compromise.
>>>>>>>
>>>>>>> .. or you can just keep shadow in page extension. It was suggested back in
>>>>>>> 2015 [1], but seems that lack of stack instrumentation was "no-way"...
>>>>>>>
>>>>>>> [1] https://lkml.org/lkml/2015/8/24/573
>>>>>>
>>>>>> Right. It describes basically the same idea.
>>>>>>
>>>>>> How is page_ext better than adding data page struct?
>>>>>
>>>>> page_ext is already here along with some other debug options ;)
>>>
>>>
>>> But page struct is also here. What am I missing?
>>>
>>
>> Probably, free room in page struct? I guess most of the page_ext stuff would
>> love to live in page struct, but... for instance, look at page idle tracking
>> which has to live in page_ext only for 32-bit.
>
>
> Sorry for my ignorance. What's the fundamental problem with just
> pushing everything into page struct?

I think [1] has an answer for your question ;)

>
> I don't see anything relevant in page struct comment. Nor I see "idle"
> nor "tracking" page struct. I see only 2 mentions of CONFIG_64BIT, but
> both declare the same fields just with different types (int vs short).

Right, it is because implementation is based on page flags [1]:

Note, since there is no room for extra page flags on 32 bit, this feature
uses extended page flags when compiled on 32 bit.


[1] https://lwn.net/Articles/565097/
[2] 33c3fc7 ("mm: introduce idle page tracking")

Cheers
Vladimir

>
>
>
>>>>>> It seems that memory for all page_ext is preallocated along with page
>>>>>> structs; but just the lookup is slower.
>>>>>>
>>>>>
>>>>> Yup. Lookup would look like (based on v4.0):
>>>>>
>>>>> ...
>>>>> page_ext = lookup_page_ext_begin(virt_to_page(start));
>>>>>
>>>>> do {
>>>>> page_ext->shadow[idx++] = value;
>>>>> } while (idx < bound);
>>>>>
>>>>> lookup_page_ext_end((void *)page_ext);
>>>>>
>>>>> ...
>>>>
>>>> Correction: please, ignore that *_{begin,end} stuff - mainline only
>>>> lookup_page_ext() is only used.
>>>
>>>
>>> Note that this added code will be executed during handling of each and
>>> every memory access in kernel. Every instruction matters on that path.
>>
>> I know, I know... still better than nothing.
>>
>>> The additional indirection via page struct will also slow down it, but
>>> that's the cost for lower memory consumption and potentially 32-bit
>>> support. For page_ext it looks like even more overhead for no gain.
>>>
>>
>> eefa864 (mm/page_ext: resurrect struct page extending code for debugging)
>> express some cases where keeping data in page_ext has benefit.
>>
>> Cheers
>> Vladimir
>