Re: [00/41] Large Blocksize Support V7 (adds memmap support)

From: Nick Piggin
Date: Wed Sep 12 2007 - 13:43:09 EST

Next message: Nick Piggin: "Re: SYSFS: need a noncaching read"
Previous message: Michal Piotrowski: "Re: [1/4] 2.6.23-rc6: known regressions"
In reply to: Christoph Lameter: "Re: [00/41] Large Blocksize Support V7 (adds memmap support)"
Next in thread: Christoph Lameter: "Re: [00/41] Large Blocksize Support V7 (adds memmap support)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wednesday 12 September 2007 07:52, Christoph Lameter wrote:
> On Tue, 11 Sep 2007, Nick Piggin wrote:
> > > No you have not explained why the theoretical issues continue to exist
> > > given even just considering Lumpy Reclaim in .23 nor what effect the
> > > antifrag patchset would have.
> >
> > So how does lumpy reclaim, your slab patches, or anti-frag have
> > much effect on the worst case situation? Or help much against a
> > targetted fragmentation attack?
>
> F.e. Lumpy reclaim reclaim neighboring pages and thus works against
> fragmentation. So your formulae no longer works.

OK, I'll describe how it works and what the actual problem with it is. I
haven't looked at the patches for a fair while so you can forgive my
inaccuracies in terminology or exact details.

So anti-frag groups memory into (say) 2MB chunks. Top priority heuristic
is that allocations which are movable all go into groups with other movable
memory and allocations which are not movable do not go into these
groups. This is flexible though, so if a workload wants to use more non
movable memory, it is allowed to eat into first free, then movable
groups after filling all non-movable groups. This is important because
it is what makes anti-frag flexible (otherwise it would just be a memory
reserve in another form).

In my attack, I cause the kernel to allocate lots of unmovable allocations
and deplete movable groups. I theoretically then only need to keep a
small number (1/2^N) of these allocations around in order to DoS a
page allocation of order N.

And it doesn't even have to be a DoS. The natural fragmentation
that occurs today in a kernel today has the possibility to slowly push out
the movable groups and give you the same situation.

Now there are lots of other little heuristics, *including lumpy reclaim
and various slab reclaim improvements*, that improve the effectiveness
or speed of this thing, but at the end of the day, it has the same basic
issues. Unless you can move practically any currently unmovable
allocation (which will either be a lot of intrusive code or require a
vmapped kernel), then you can't get around the fundamental problem.
And if you do get around the fundamental problem, you don't really
need to group pages by mobility any more because they are all
movable[*].

So lumpy reclaim does not change my formula nor significantly help
against a fragmentation attack. AFAIKS.

[*] ok, this isn't quite true because if you can actually put a hard limit on
unmovable allocations then anti-frag will fundamentally help -- get back to
me on that when you get patches to move most of the obvious ones.
Like pinned dentries, inodes, buffer heads, page tables, task structs, mm
structs, vmas, anon_vmas, radix-tree nodes, etc.

> > > And you have used a 2M pagesize which is
> > > irrelevant to this patchset that deals with blocksizes up to 64k. In my
> > > experience the use of blocksize < PAGE_COSTLY_ORDER (32k) is reasonably
> > > safe.
> >
> > I used EXACTLY the page sizes that you brought up in your patch
> > description (ie. 64K and 2MB).
>
> The patch currently only supports 64k.

Sure, and I pointed out the theoretical figure for 64K pages as well. Is that
figure not problematic to you? Where do you draw the limit for what is
acceptable? Why? What happens with tiny memory machines where a reserve
or even the anti-frag patches may not be acceptable and/or work very well?
When do you require reserve pools? Why are reserve pools acceptable for
first-class support of filesystems when it has been very loudly been made a
known policy decision by Linus in the past (and for some valid reasons) that
we should not put limits on the sizes of caches in the kernel.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Nick Piggin: "Re: SYSFS: need a noncaching read"
Previous message: Michal Piotrowski: "Re: [1/4] 2.6.23-rc6: known regressions"
In reply to: Christoph Lameter: "Re: [00/41] Large Blocksize Support V7 (adds memmap support)"
Next in thread: Christoph Lameter: "Re: [00/41] Large Blocksize Support V7 (adds memmap support)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]