Re: [PATCH v5 3/4] zram: zram memory size limitation

From: Minchan Kim
Date: Tue Aug 26 2014 - 22:50:45 EST


Hey Joonsoo,

On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
> Hello, Minchan and David.
>
> On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
> > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@xxxxxxxxxx> wrote:
> > > Hey Joonsoo,
> > >
> > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
> > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
> > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
> > >> > ret = -ENOMEM;
> > >> > goto out;
> > >> > }
> > >> > +
> > >> > + if (zram->limit_pages &&
> > >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
> > >> > + zs_free(meta->mem_pool, handle);
> > >> > + ret = -ENOMEM;
> > >> > + goto out;
> > >> > + }
> > >> > +
> > >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
> > >>
> > >> Hello,
> > >>
> > >> I don't follow up previous discussion, so I could be wrong.
> > >> Why this enforcement should be here?
> > >>
> > >> I think that this has two problems.
> > >> 1) alloc/free happens unnecessarilly if we have used memory over the
> > >> limitation.
> > >
> > > True but firstly, I implemented the logic in zsmalloc, not zram but
> > > as I described in cover-letter, it's not a requirement of zsmalloc
> > > but zram so it should be in there. If every user want it in future,
> > > then we could move the function into zsmalloc. That's what we
> > > concluded in previous discussion.
>
> Hmm...
> Problem is that we can't avoid these unnecessary overhead in this
> implementation. If we can implement this feature in zram efficiently,
> it's okay. But, I think that current form isn't.


If we can add it in zsmalloc, it would be more clean and efficient
for zram but as I said, at the moment, I didn't want to put zram's
requirement into zsmalloc because to me, it's weird to enforce max
limit to allocator. It's client's role, I think.

If current implementation is expensive and rather hard to follow,
It would be one reason to move the feature into zsmalloc but
I don't think it makes critical trobule in zram usecase.
See below.

But I still open and will wait others's opinion.
If other guys think zsmalloc is better place, I am willing to move
it into zsmalloc.

>
> > >
> > > Another idea is we could call zs_get_total_pages right before zs_malloc
> > > but the problem is we cannot know how many of pages are allocated
> > > by zsmalloc in advance.
> > > IOW, zram should be blind on zsmalloc's internal.
> > >
> >
> > We did however suggest that we could check before hand to see if
> > max was already exceeded as an optimization.
> > (possibly with a guess on usage but at least using the minimum of 1 page)
> > In the contested case, the max may already be exceeded transiently and
> > therefore we know this one _could_ fail (it could also pass, but odds
> > aren't good).
> > As Minchan mentions this was discussed before - but not into great detail.
> > Testing should be done to determine possible benefit. And as he also
> > mentions, the better place for it may be in zsmalloc, but that
> > requires an ABI change.
>
> Why we hesitate to change zsmalloc API? It is in-kernel API and there
> are just two users now, zswap and zram. We can change it easily.
> I think that we just need following simple API change in zsmalloc.c.
>
> zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
> =>
> zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
> *zpool_op)
>
> It's pool allocator so there is no obstacle for us to limit maximum
> memory usage in zsmalloc. It's a natural idea to limit memory usage
> for pool allocator.
>
> > Certainly a detailed suggestion could happen on this thread and I'm
> > also interested
> > in your thoughts, but this patchset should be able to go in as is.
> > Memory exhaustion avoidance probably trumps the possible thrashing at
> > threshold.
> >
> > > About alloc/free cost once if it is over the limit,
> > > I don't think it's important to consider.
> > > Do you have any scenario in your mind to consider alloc/free cost
> > > when the limit is over?
> > >
> > >> 2) Even if this request doesn't do new allocation, it could be failed
> > >> due to other's allocation. There is time gap between allocation and
> > >> free, so legimate user who want to use preallocated zsmalloc memory
> > >> could also see this condition true and then he will be failed.
> > >
> > > Yeb, we already discussed that. :)
> > > Such false positive shouldn't be a severe problem if we can keep a
> > > promise that zram user cannot exceed mem_limit.
> > >
>
> If we can keep such a promise, why we need to limit memory usage?
> I guess that this limit feature is useful for user who can't keep such promise.
> So, we should assume that this false positive happens frequently.


The goal is to limit memory usage within some threshold.
so false positive shouldn't be harmful unless it exceeds the threshold.
In addition, If such false positive happens frequently, it means
zram is very trobule so that user would see lots of write fail
message, sometime really slow system if zram is used for swap.
If we protect just one write from the race, how much does it help
this situation? I don't think it's critical problem.

>
> > And we cannot avoid the race, nor can we avoid in a low overhead competitive
> > concurrent process transient inconsistent states.
> > Different views for different observers.
> > They are a consequence of the theory of "Special Computational Relativity".
> > I am working on a String Unification Theory of Quantum and General CR in LISP.
> > ;-)
>
> If we move limit logic to zsmalloc, we can avoid the race by commiting
> needed memory size before actual allocation attempt. This commiting makes
> concurrent process serialized so there is no race here. There is
> possibilty to fail to allocate, but I think this is better than alloc
> and free blindlessly depending on inconsistent states.

Normally, zsmalloc/zsfree allocates object from existing pool so
it's not big overhead and if someone continue to try writing once limit is
full, another overhead (vfs, fs, block) would be bigger than zsmalloc
so it's not a problem, I think.

>
> Thanks.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/