Re: [RFC 0/6] zsmalloc support compaction
From: Minchan Kim
Date: Thu Dec 18 2014 - 19:45:57 EST
Hey Seth,
On Wed, Dec 17, 2014 at 05:19:30PM -0600, Seth Jennings wrote:
> On Tue, Dec 02, 2014 at 11:49:41AM +0900, Minchan Kim wrote:
> > Recently, there was issue about zsmalloc fragmentation and
> > I got a report from Juno that new fork failed although there
> > are plenty of free pages in the system.
> > His investigation revealed zram is one of the culprit to make
> > heavy fragmentation so there was no more contiguous 16K page
> > for pgd to fork in the ARM.
> >
> > This patchset implement *basic* zsmalloc compaction support
> > and zram utilizes it so admin can do
> > "echo 1 > /sys/block/zram0/compact"
> >
> > Actually, ideal is that mm migrate code is aware of zram pages and
> > migrate them out automatically without admin's manual opeartion
> > when system is out of contiguous page. Howver, we need more thinking
> > before adding more hooks to migrate.c. Even though we implement it,
> > we need manual trigger mode, too so I hope we could enhance
> > zram migration stuff based on this primitive functions in future.
> >
> > I just tested it on only x86 so need more testing on other arches.
> > Additionally, I should have a number for zsmalloc regression
> > caused by indirect layering. Unfortunately, I don't have any
> > ARM test machine on my desk. I will get it soon and test it.
> > Anyway, before further work, I'd like to hear opinion.
> >
> > Pathset is based on v3.18-rc6-mmotm-2014-11-26-15-45.
>
> Hey Minchan, sorry it has taken a while for me to look at this.
It's better than forever silence. Thanks, Seth.
>
> I have prototyped this for zbud to and I see you face some of the same
> issues, some of them much worse for zsmalloc like large number of
> objects to move to reclaim a page (with zbud, the max is 1).
>
> I see you are using zsmalloc itself for allocating the handles. Why not
> kmalloc()? Then you wouldn't need to track the handle_class stuff and
> adjust the class sizes (just in the interest of changing only what is
> need to achieve the functionality).
1. kmalloc minimum size : 8 byte but 4byte is enough to keep the handle
2. handle can pin lots of slab pages in memory
3. it's inaccurate for accouting memory usage of zsmalloc
4. Creating handle class in zsmalloc is simple.
>
> I used kmalloc() but that is not without issue as the handles can be
> allocated from many slabs and any slab that contains a handle can't be
> freed, basically resulting in the handles themselves needing to be
> compacted, which they can't be because the user handle is a pointer to
> them.
Sure.
>
> One way to fix this, but it would be some amount of work, is to have the
> user (zswap/zbud) provide the space for the handle to zbud/zsmalloc.
zram?
> The zswap/zbud layer knows the size of the device (i.e. handle space)
zram?
> and could allocate a statically sized vmalloc area for holding handles
> so they don't get spread all over memory. I haven't fully explored this
> idea yet.
Hmm, I don't think it's a good idea.
Don't make an assumption that user of allocator know the size in advance.
In addition, you want to populate all of pages to keep handle in vmalloc
area statiscally? It wouldn't be huge but it depends on the user's setup
of disksize. More question: How do you search empty slot for new handle?
At last, we need caching logic and small allocator for that.
IMHO, it has cons than pros compared current my approach.
>
> It is pretty limiting having the user trigger the compaction. Can we
Yeb, As I said, we need more policy but in this step, I want to introduce
primitive functions to enhance our policy as next step.
> have a work item that periodically does some amount of compaction?
I'm not sure periodic cleanup is good idea. I'd like to pass the decision
to the user, rather than allocator itself. It's enough for allocator
to expose current status to the user.
> Maybe also have something analogous to direct reclaim that, when
> zs_malloc fails to secure a new page, it will try to compact to get one?
> I understand this is a first step. Maybe too much.
Yeb, I want to separate enhance as another patchset.
>
> Also worth pointing out that the fullness groups are very coarse.
> Combining the objects from a ZS_ALMOST_EMPTY zspage and ZS_ALMOST_FULL
> zspage, might not result in very tight packing. In the worst case, the
> destination zspage would be slightly over 1/4 full (see
> fullness_threshold_frac)
Good point. Actually, I had noticed that.
after all of ALMOST_EMPTY zspages are done to migrate, we might peek
out ZS_ALMOST_FULL zspages to consider.
>
> It also seems that you start with the smallest size classes first.
> Seems like if we start with the biggest first, we move fewer objects and
> reclaim more pages.
Good idea. I will respin.
Thanks for the comment!
>
> It does add a lot of code :-/ Not sure if there is any way around that
> though if we want this functionality for zsmalloc.
>
> Seth
>
> >
> > Thanks.
> >
> > Minchan Kim (6):
> > zsmalloc: expand size class to support sizeof(unsigned long)
> > zsmalloc: add indrection layer to decouple handle from object
> > zsmalloc: implement reverse mapping
> > zsmalloc: encode alloced mark in handle object
> > zsmalloc: support compaction
> > zram: support compaction
> >
> > drivers/block/zram/zram_drv.c | 24 ++
> > drivers/block/zram/zram_drv.h | 1 +
> > include/linux/zsmalloc.h | 1 +
> > mm/zsmalloc.c | 596 +++++++++++++++++++++++++++++++++++++-----
> > 4 files changed, 552 insertions(+), 70 deletions(-)
> >
> > --
> > 2.0.0
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>
--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/