Re: -mm merge plans -- anti-fragmentation

From: Nick Piggin
Date: Tue Jul 10 2007 - 23:00:27 EST


On Tue, Jul 10, 2007 at 12:11:45PM -0500, Dave McCracken wrote:
> On Tuesday 10 July 2007, Nick Piggin wrote:
> > On Tue, Jul 10, 2007 at 09:29:45AM -0500, Dave McCracken wrote:
> > > I find myself wondering what "sufficiently convincing noises" are.  I
> > > think we can all agree that in the current kernel order>0 allocations are
> > > a disaster.
> >
> > Are they? For what the kernel currently uses them for, I don't think
> > the lower order ones are so bad. Now and again we used to get reports
> > of atomic order 3 allocation failures with e1000 for example, but a
> > lot of those were before kswapd would properly asynchronously start
> > reclaim for atomic and higher order allocations. The odd failure
> > sometimes catches my eye, but nothing I would call a disaster.
>
> Ok, maybe disaster is too strong a word. But any kind of order>0 allocation
> still has to be approached with fear and caution, with a well tested fallback
> in the case of the inevitable failures. How many driver writers would have
> benefited from using order>0 pages, but turned aside to other less optimal
> solutions due to their unreliability? We don't know, and probably never
> will. Those people have moved on and won't revisit that design decision.

On the other side of the coin, we can't just merge this in the hope
that some good uses might turn up (IMO).


> > > The sheer list of patches lined up behind this set is strong evidence
> > > that there are useful features which depend on a working order>0.  When
> > > you add in the existing code that has to struggle with allocation
> > > failures or resort to special pools (ie hugetlbfs), I see a clear vote
> > > for the need for this patch.
> >
> > Really the only patches so far that I think have convincing reasons are
> > memory unplug and hugepage, and both of those can get a long way by using
> > a reserve zone (note it isn't entirely reserved, but still available for
> > things like pagecache). Beyond that, is there a big demand, and do we
> > want to make this fundamental change in direction in the kernel to
> > satisfy that demand?
>
> Yes, these projects have workarounds, because they have to. But the
> workarounds are painful and often require that the user specify in advance
> what memory they intend to use for this purpose, something users often have
> to learn by trial and error. Mel's patches would eliminate this barrier to
> use of the features.
>
> I don't see Mel's patches as "a fundamental change in direction". I think
> you're overstating the case. I see it as fixing a deficiency in the design
> of the page allocator, and a long overdue fix.

I would still say that with Mel's patches in, you need to have a fallback
to order-0 because memory can still get fragemnted. But no Mel's patches
are not exactly a fundamental change in direction itself, but introducing
higher order allocations without fallbacks is a change (OK, order 1 or 2
is used today, and mostly because of the nature of the allocator they're OK
too, but if we're talking about like 64K+ of contiguous pages).


> > > Some object because order>0 will still be able to fail.  I point out that
> > > order==0 can also fail, though we go to great lengths to prevent it.
> > >  Mel's patches raise the success rate of order>0 to within a few percent
> > > of order==0.  All this means is callers will need to decide how to handle
> > > the infrequent failure.  This should be true no matter what the order.
> >
> > So small ones like order-1 and 2 seem reasonably good right now AFAIKS.
> > If you perhaps want to say start using order-4  pages for slab or
> > some other kernel memory allocations, then you can run into the situation
> > where memory gets fragmented such that you have one sixteenth of your
> > memory actualy used but you can't allocate from any of your slabs because
> > there are no order-4 pages left. I guess this is a big difference between
> > order-low failures and order-high.
>
> In summary, I think I can rephrase your arguments against the patches as
> order>0 allocation pretty much works now for small orders, and people are
> living with it". Is that fairly accurate? My counter argument is that we

Well it does work for small orders and if by living with it you mean works
OK, then yes.


> can easily make it work much better and vastly simplify the code that is
> having to work around the lack of it by applying Mel's patches.

OK we have a lot contained in that statement :)

Make it work much better -- OK, so it should be easy to get the evidence
to justify this, then?

Vastly simplify the code -- so firstly you have to weigh this against the
increased complexity of Mel's patches, and secondly you are saying that we
can abandon fallback code? That's where we're talking about a fundamental
change in direction.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/