Re: [RFC][PATCH 0/7] Sanitization of slabs based on grsecurity/PaX

From: Laura Abbott
Date: Wed Jan 20 2016 - 22:35:19 EST

On 1/13/16 7:49 PM, Laura Abbott wrote:
On 1/8/16 6:07 AM, Christoph Lameter wrote:
On Thu, 7 Jan 2016, Laura Abbott wrote:

The slub_debug=P not only poisons it enables other consistency checks on the
slab as well, assuming my understanding of what check_object does is correct.
My hope was to have the poison part only and none of the consistency checks in
an attempt to mitigate performance issues. I misunderstood when the checks
actually run and how SLUB_DEBUG was used.

Ok I see that there pointer check is done without checking the
corresponding debug flag. Patch attached thar fixes it.

Another option would be to have a flag like SLAB_NO_SANITY_CHECK.
sanitization enablement would just be that and SLAB_POISON
in the debug options. The disadvantage to this approach would be losing
the sanitization for ->ctor caches (the grsecurity version works around this
by re-initializing with ->ctor, I haven't heard any feedback if this actually
acceptable) and not having some of the fast paths enabled
(assuming I'm understanding the code path correctly.) which would also
be a performance penalty

I think we simply need to fix the missing check there. There is already a
flag SLAB_DEBUG_FREE for the pointer checks.

The patch improves performance but the overall performance of these full
sanitization patches is still significantly better than slub_debug=P. I'll
put some effort into seeing if I can figure out where the slow down is
coming from.

There are quite a few other checks which need to be skipped over as well,
but I don't think skipping those are going to be sufficient to give an
acceptable performance; a quick 'hackbench -g 20 -l 1000' shows at least
a 3.5 second difference between just skipping all the checks+slab_debug=P
and this series.

The SLAB_DEBUG flags force everything to skip the CPU caches which is
causing the slow down. I experimented with allowing the debugging to
happen with CPU caches but I'm not convinced it's possible to do the
checking on the fast path in a consistent manner without adding
locking. Is it worth refactoring the debugging to be able to be used
on cpu caches or should I take the approach here of having the clear
be separate from free_debug_processing?