Re: [RFC PATCH] greatly reduce SLOB external fragmentation

From: Matt Mackall
Date: Thu Jan 10 2008 - 13:43:39 EST



On Thu, 2008-01-10 at 10:28 -0800, Linus Torvalds wrote:
>
> On Thu, 10 Jan 2008, Matt Mackall wrote:
> > >
> > > (I'm not a fan of slabs per se - I think all the constructor/destructor
> > > crap is just that: total crap - but the size/type binning is a big deal,
> > > and I think SLOB was naÃve to think a pure first-fit makes any sense. Now
> > > you guys are size-binning by just two or three bins, and it seems to make
> > > a difference for some loads, but compared to SLUB/SLAB it's a total hack).
> >
> > Here I'm going to differ with you. The premises of the SLAB concept
> > (from the original paper) are:
>
> I really don't think we differ.
>
> The advantage of slab was largely the binning by type. Everything else was
> just a big crock. SLUB does the binning better, by really just making the
> type binning be about what really matters - the *size* of the type.
>
> So my argument was that the type/size binning makes sense (size more so
> than type), but the rest of the original Sun arguments for why slab was
> such a great idea were basically just the crap.
>
> Hard type binning was a mistake (but needed by slab due to the idiotic
> notion that constructors/destructors are "good for caches" - bleargh). I
> suspect that hard size binning is a mistake too (ie there are probably
> cases where you do want to split unused bigger size areas), but the fact
> that all of our allocators are two-level (with the page allocator acting
> as a size-agnostic free space) may help it somewhat.
>
> And yes, I do agree that any current allocator has problems with the big
> sizes that don't fit well into a page or two (like task_struct). That
> said, most of those don't have lots of allocations under many normal
> circumstances (even if there are uses that will really blow them up).
>
> The *big* slab users at least for me tend to be ext3_inode_cache and
> dentry. Everything else is orders of magnitude less. And of the two bad
> ones, ext3_inode_cache is the bad one at 700+ bytes or whatever (resulting
> in ~10% fragmentation just due to the page thing, regardless of whether
> you use an order-0 or order-1 page allocation).
>
> Of course, dentries fit better in a page (due to being smaller), but then
> the bigger number of dentries per page make it harder to actually free
> pages, so then you get fragmentation from that. Oh well. You can't win.

One idea I've been kicking around is pushing the boundary for the buddy
allocator back a bit (to 64k, say) and using SL*B under that. The page
allocators would call into buddy for larger than 64k (rare!) and SL*B
otherwise. This would let us greatly improve our handling of things like
task structs and skbs and possibly also things like 8k stacks and jumbo
frames. As SL*B would never be competing with the page allocator for
contiguous pages (the buddy allocator's granularity would be 64k), I
don't think this would exacerbate the page-level fragmentation issues.

Crazy?

--
Mathematics is the supreme nostalgia of our time.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/