Re: [PATCH v5 3/4] zram: zram memory size limitation

From: David Horner
Date: Wed Aug 27 2014 - 10:45:00 EST


On Wed, Aug 27, 2014 at 10:03 AM, Dan Streetman <ddstreet@xxxxxxxx> wrote:
> On Tue, Aug 26, 2014 at 10:51 PM, Minchan Kim <minchan@xxxxxxxxxx> wrote:
>> Hey Joonsoo,
>>
>> On Wed, Aug 27, 2014 at 10:26:11AM +0900, Joonsoo Kim wrote:
>>> Hello, Minchan and David.
>>>
>>> On Tue, Aug 26, 2014 at 08:22:29AM -0400, David Horner wrote:
>>> > On Tue, Aug 26, 2014 at 3:55 AM, Minchan Kim <minchan@xxxxxxxxxx> wrote:
>>> > > Hey Joonsoo,
>>> > >
>>> > > On Tue, Aug 26, 2014 at 04:37:30PM +0900, Joonsoo Kim wrote:
>>> > >> On Mon, Aug 25, 2014 at 09:05:55AM +0900, Minchan Kim wrote:
>>> > >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>> > >> > ret = -ENOMEM;
>>> > >> > goto out;
>>> > >> > }
>>> > >> > +
>>> > >> > + if (zram->limit_pages &&
>>> > >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>> > >> > + zs_free(meta->mem_pool, handle);
>>> > >> > + ret = -ENOMEM;
>>> > >> > + goto out;
>>> > >> > + }
>>> > >> > +
>>> > >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>> > >>
>>> > >> Hello,
>>> > >>
>>> > >> I don't follow up previous discussion, so I could be wrong.
>>> > >> Why this enforcement should be here?
>>> > >>
>>> > >> I think that this has two problems.
>>> > >> 1) alloc/free happens unnecessarilly if we have used memory over the
>>> > >> limitation.
>>> > >
>>> > > True but firstly, I implemented the logic in zsmalloc, not zram but
>>> > > as I described in cover-letter, it's not a requirement of zsmalloc
>>> > > but zram so it should be in there. If every user want it in future,
>>> > > then we could move the function into zsmalloc. That's what we
>>> > > concluded in previous discussion.
>>>
>>> Hmm...
>>> Problem is that we can't avoid these unnecessary overhead in this
>>> implementation. If we can implement this feature in zram efficiently,
>>> it's okay. But, I think that current form isn't.
>>
>>
>> If we can add it in zsmalloc, it would be more clean and efficient
>> for zram but as I said, at the moment, I didn't want to put zram's
>> requirement into zsmalloc because to me, it's weird to enforce max
>> limit to allocator. It's client's role, I think.
>>
>> If current implementation is expensive and rather hard to follow,
>> It would be one reason to move the feature into zsmalloc but
>> I don't think it makes critical trobule in zram usecase.
>> See below.
>>
>> But I still open and will wait others's opinion.
>> If other guys think zsmalloc is better place, I am willing to move
>> it into zsmalloc.
>
> Moving it into zsmalloc would allow rejecting new zsmallocs before
> actually crossing the limit, since it can calculate that internally.
> However, with the current patches the limit will only be briefly
> crossed, and it should not be crossed by a large amount. Now, if this
> is happening repeatedly and quickly during extreme memory pressure,
> the constant alloc/free will clearly be worse than a simple internal
> calculation and failure. But would it ever happen repeatedly once the
> zram limit is reached?
>
> Now that I'm thinking about the limit from the perspective of the zram
> user, I wonder what really will happen. If zram is being used for
> swap space, then when swap starts getting errors trying to write
> pages, how damaging will that be to the system? I haven't checked
> what swap does when it encounters disk errors. Of course, with no
> zram limit, continually writing to zram until memory is totally
> consumed isn't good either. But in any case, I would hope that swap
> would not repeatedly hammer on a disk when it's getting write failures
> from it.
>
> Alternately, if zram was being used as a compressed ram disk for
> regular file storage, it's entirely up to the application to handle
> write failures, so it may continue to try to write to a full zram
> disk.
>
> As far as what the zsmalloc api would look like with the limit added,
> it would need a setter and getter function (adding it as a param to
> the create function would be optional i think). But more importantly,
> it would need to handle multiple ways of specifying the limit. In our
> specific current use cases, zram and zswap, each handles their
> internal limit differently - zswap currently uses a % of total ram as
> its limit (defaulting to 20), while with these patches zram will use a
> specific number of bytes as its limit (defaulting to no limit). If
> the limiting mechanism is moved into zsmalloc (and possibly zbud),
> then either both users need to use the same units (bytes or %ram), or
> zsmalloc/zbud need to be able to set their limit in either units. It
> seems to me like keeping the limit in zram/zswap is currently
> preferable, at least without both using the same limit units.
>

zswap knows what 20% (or whatever % it currently uses , and perhaps it too
will become a tuning knob) of memory is in bytes.

So, if the interface to establish a limit for a pool (or pool set, or whatever
zsmalloc sets up for its allocation mechanism) is stipulated in bytes
(to actually use pages internally, of visa-versa) , then both can use
that interface.
zram with its native page stipulation, and zswap with calculated % of memory).

Both would need a mechanism to change the max as need change,
so the API has to handle this.


Or am I way off base?


>
>>
>>>
>>> > >
>>> > > Another idea is we could call zs_get_total_pages right before zs_malloc
>>> > > but the problem is we cannot know how many of pages are allocated
>>> > > by zsmalloc in advance.
>>> > > IOW, zram should be blind on zsmalloc's internal.
>>> > >
>>> >
>>> > We did however suggest that we could check before hand to see if
>>> > max was already exceeded as an optimization.
>>> > (possibly with a guess on usage but at least using the minimum of 1 page)
>>> > In the contested case, the max may already be exceeded transiently and
>>> > therefore we know this one _could_ fail (it could also pass, but odds
>>> > aren't good).
>>> > As Minchan mentions this was discussed before - but not into great detail.
>>> > Testing should be done to determine possible benefit. And as he also
>>> > mentions, the better place for it may be in zsmalloc, but that
>>> > requires an ABI change.
>>>
>>> Why we hesitate to change zsmalloc API? It is in-kernel API and there
>>> are just two users now, zswap and zram. We can change it easily.
>>> I think that we just need following simple API change in zsmalloc.c.
>>>
>>> zs_zpool_create(gfp_t gfp, struct zpool_ops *zpool_op)
>>> =>
>>> zs_zpool_create(unsigned long limit, gfp_t gfp, struct zpool_ops
>>> *zpool_op)
>>>
>>> It's pool allocator so there is no obstacle for us to limit maximum
>>> memory usage in zsmalloc. It's a natural idea to limit memory usage
>>> for pool allocator.
>>>
>>> > Certainly a detailed suggestion could happen on this thread and I'm
>>> > also interested
>>> > in your thoughts, but this patchset should be able to go in as is.
>>> > Memory exhaustion avoidance probably trumps the possible thrashing at
>>> > threshold.
>>> >
>>> > > About alloc/free cost once if it is over the limit,
>>> > > I don't think it's important to consider.
>>> > > Do you have any scenario in your mind to consider alloc/free cost
>>> > > when the limit is over?
>>> > >
>>> > >> 2) Even if this request doesn't do new allocation, it could be failed
>>> > >> due to other's allocation. There is time gap between allocation and
>>> > >> free, so legimate user who want to use preallocated zsmalloc memory
>>> > >> could also see this condition true and then he will be failed.
>>> > >
>>> > > Yeb, we already discussed that. :)
>>> > > Such false positive shouldn't be a severe problem if we can keep a
>>> > > promise that zram user cannot exceed mem_limit.
>>> > >
>>>
>>> If we can keep such a promise, why we need to limit memory usage?
>>> I guess that this limit feature is useful for user who can't keep such promise.
>>> So, we should assume that this false positive happens frequently.
>>
>>
>> The goal is to limit memory usage within some threshold.
>> so false positive shouldn't be harmful unless it exceeds the threshold.
>> In addition, If such false positive happens frequently, it means
>> zram is very trobule so that user would see lots of write fail
>> message, sometime really slow system if zram is used for swap.
>> If we protect just one write from the race, how much does it help
>> this situation? I don't think it's critical problem.
>>
>>>
>>> > And we cannot avoid the race, nor can we avoid in a low overhead competitive
>>> > concurrent process transient inconsistent states.
>>> > Different views for different observers.
>>> > They are a consequence of the theory of "Special Computational Relativity".
>>> > I am working on a String Unification Theory of Quantum and General CR in LISP.
>>> > ;-)
>>>
>>> If we move limit logic to zsmalloc, we can avoid the race by commiting
>>> needed memory size before actual allocation attempt. This commiting makes
>>> concurrent process serialized so there is no race here. There is
>>> possibilty to fail to allocate, but I think this is better than alloc
>>> and free blindlessly depending on inconsistent states.
>>
>> Normally, zsmalloc/zsfree allocates object from existing pool so
>> it's not big overhead and if someone continue to try writing once limit is
>> full, another overhead (vfs, fs, block) would be bigger than zsmalloc
>> so it's not a problem, I think.
>>
>>>
>>> Thanks.
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
>>
>> --
>> Kind regards,
>> Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/