Re: [PATCH v4 3/4] zram: zram memory size limitation

From: Dan Streetman
Date: Tue Aug 26 2014 - 09:40:36 EST


On Tue, Aug 26, 2014 at 12:28 AM, David Horner <ds2horner@xxxxxxxxx> wrote:
> On Mon, Aug 25, 2014 at 2:12 PM, Dan Streetman <ddstreet@xxxxxxxx> wrote:
>> On Mon, Aug 25, 2014 at 4:22 AM, David Horner <ds2horner@xxxxxxxxx> wrote:
>>> On Mon, Aug 25, 2014 at 12:37 AM, Minchan Kim <minchan@xxxxxxxxxx> wrote:
>>>> On Sun, Aug 24, 2014 at 11:40:50PM -0400, David Horner wrote:
>>>>> On Sun, Aug 24, 2014 at 7:56 PM, Minchan Kim <minchan@xxxxxxxxxx> wrote:
>>>>> > Hello David,
>>>>> >
>>>>> > On Fri, Aug 22, 2014 at 06:55:38AM -0400, David Horner wrote:
>>>>> >> On Thu, Aug 21, 2014 at 8:42 PM, Minchan Kim <minchan@xxxxxxxxxx> wrote:
>>>>> >> > Since zram has no control feature to limit memory usage,
>>>>> >> > it makes hard to manage system memrory.
>>>>> >> >
>>>>> >> > This patch adds new knob "mem_limit" via sysfs to set up the
>>>>> >> > a limit so that zram could fail allocation once it reaches
>>>>> >> > the limit.
>>>>> >> >
>>>>> >> > In addition, user could change the limit in runtime so that
>>>>> >> > he could manage the memory more dynamically.
>>>>> >> >
>>>>> >> - Default is no limit so it doesn't break old behavior.
>>>>> >> + Initial state is no limit so it doesn't break old behavior.
>>>>> >>
>>>>> >> I understand your previous post now.
>>>>> >>
>>>>> >> I was saying that setting to either a null value or garbage
>>>>> >> (which is interpreted as zero by memparse(buf, NULL);)
>>>>> >> removes the limit.
>>>>> >>
>>>>> >> I think this is "surprise" behaviour and rather the null case should
>>>>> >> return -EINVAL
>>>>> >> The test below should be "good enough" though not catching all garbage.
>>>>> >
>>>>> > Thanks for suggesting but as I said, it should be fixed in memparse itself,
>>>>> > not caller if it is really problem so I don't want to touch it in this
>>>>> > patchset. It's not critical for adding the feature.
>>>>> >
>>>>>
>>>>> I've looked into the memparse function more since we talked.
>>>>> I do believe a wrapper function around it for the typical use by sysfs would
>>>>> be very valuable.
>>>>
>>>> Agree.
>>>>
>>>>> However, there is nothing wrong with memparse itself that needs to be fixed.
>>>>>
>>>>> It does what it is documented to do very well (In My Uninformed Opinion).
>>>>> It provides everything that a caller needs to manage the token that it
>>>>> processes.
>>>>> It thus handles strings like "7,,5,8,,9" with the implied zeros.
>>>>
>>>> Maybe strict_memparse would be better to protect such things so you
>>>> could find several places to clean it up.
>>>>
>>>>>
>>>>> The fact that other callers don't check the return pointer value to
>>>>> see if only a null
>>>>> string was processed, is not its fault.
>>>>> Nor that it may not be ideally suited to sysfs attributes; that other store
>>>>> functions use it in a given manner does not means that is correct -
>>>>> nor that it is
>>>>> incorrect for that "knob". Some attributes could be just as valid with
>>>>> null zeros.
>>>>>
>>>>> And you are correct, to disambiguate the zero is not required for the
>>>>> limit feature.
>>>>> Your original patch which disallowed zero was full feature for mem_limit.
>>>>> It is the requested non-crucial feature to allow zero to reestablish
>>>>> the initial state
>>>>> that benefits from distinguishing an explicit zero from a "default zero'
>>>>> when garbage is written.
>>>>>
>>>>> The final argument is that if we release this feature as is the undocumented
>>>>> functionality could be relied upon, and when later fixed: user space breaks.
>>>>
>>>> I don't get it. Why does it break userspace?
>>>> The sysfs-block-zram says "0" means disable the limit.
>>>> If someone writes *garabge* but work as if disabling the limit,
>>>> it's not a right thing and he already broke although it worked
>>>> so it would be not a problem if we fix later.
>>>> (ie, we don't need to take care of broken userspace)
>>>> Am I missing your point?
>>>>
>>>
>>> Perhaps you are missing my point, perhaps ignoring or dismissing.
>>>
>>> Basically, if a facility works in a useful way, even if it was designed for
>>> different usage, that becomes the "accepted" interface/usage.
>>> The developer may not have intended that usage or may even considered
>>> it wrong and a broken usage, but it is what it is and people become
>>> reliant on that behaviour.
>>>
>>> Case in point is memparse itself.
>>>
>>> The developer intentionally sets the return pointer because that is the
>>> only value that can be validated for correct performance.
>>> The return value allows -ve so the standard error message passing is not valid.
>>> Unfortunately, C allows the user to pass a NULL value in the parameter.
>>> The developer could consider that absurd and fundamentally broken.
>>> But to the user it is a valid situation, because (perhaps) it can't be
>>> bothered to handle error cases.
>>>
>>> So, who is to blame.
>>> You say memparse, that it is fundamentally broken,
>>> because it didn't check to see that it was used correctly.
>>> And I say mem_limit_store is fundamentally broken,
>>> because it didn't check to see that it was used correctly.
>>
>> I think we should look at what the rest of the kernel does as far as
>> checking memparse results. It appears to be a mix of some code
>> checking memparse while others don't. The most common way to check
>> appears to be to verify that memparse actually parsed at least 1
>> character, e.g.:
>> oldp = p;
>> mem_size = memparse(p, &p);
>> if (p == oldp)
>> return -EINVAL;
>>
>> although other places where 0 isn't valid can simply check for that:
>> mem_size = memparse(p, &p);
>> /* don't remove all of memory when handling "mem={invalid}" param */
>> if (mem_size == 0)
>> return -EINVAL;
>>
>> or even the other memparse use in zram_drv.c:
>> disksize = memparse(buf, NULL);
>> if (!disksize)
>> return -EINVAL;
>>
>>
>> And there seem to be other places where (maybe?) there's no checking
>> at all. However, it also seems like many cases of memparse usage are
>> looking for a non-zero value, and therefore they can either
>> immediately check for zero/invalid or (possibly) later code has checks
>> to avoid using any zero value. In this case though, 0 is a valid
>> value. So, while I agree that if a user passes an invalid (i.e.
>> non-numeric) value it's clearly user error, it might be closer to the
>> apparent (although unwritten AFAICT) memparse usage api to check the
>> result for validity; in our case a simple check if at least 1 char was
>> parsed is all that's needed, e.g.:
>>
>> {
>> u64 limit;
>> char *tmp = buf;
>> struct zram *zram = dev_to_zram(dev);
>>
>> limit = memparse(buf, &tmp);
>> if (buf == tmp) /* no chars parsed, invalid input */
>> return -EINVAL;
>> down_write(&zram->init_lock);
>> ...
>>
>>
>> Separate from this patch, it would also help if the lib/cmdline.c
>> memparse doc was at least updated to clarify when the result should be
>> checked for validity
>
> FWIW:
> I was pondering why I thought this was the wrong place.
> On reflection the best explanation is that it is not validity -
> the program does what it does quite well.
> (although it does have flaws for use by sysfs
> 1) it uses simple_strtoull which according to kernel.h#L269 is obsolete
> 2) it checks for a suffix in the null zero case
> (that means G,K,M are all valid memory size constants,
> and I think that should not be in the definition of
> valid mem parms)
> 3) it does nothing to enforce termination of the input.
> Both simple_strtoull and its successor kstrtoull are not
> buffer overrun safe.
> And so neither is memparse.
> It may be the sysfs buffer management does some magic here
> - but I have not seen it documented nor in code.)
>
> Rather than _validity_ it is _applicability_ that needs explaining.
> And that is not documented in the function that does its thing.
> But rather in the code that uses it, and more specifically in the framework
> established for its specific use - as in sysfs for numeric memory parameters.

Well, sysfs isn't the only user of memparse, over half of its usage is
from arch/, presumably for kernel boot param parsing. So the doc on
its usage shouldn't only be for sysfs.

>
>> and how best to do that (e.g. if 0 is an invalid value, just check if
>> the result is 0; if 0 is a possible valid value, check if any chars
>> were parsed).
>>
>>
>>>
>>> The difference is that memparse cannot stop being abused
>>> (C allows the NULL argument and extensive tricks are required to address that)
>>> however, we can readily fix mem_limit_store and ensure
>>> 1) no regression when the interface IS fixed and
>>> 2) predictable behaviour when accidental or "fuzzy" input arrives.
>>>
>>>
>>>>> They say getting API right is a difficult exercise. I suggest, if we
>>>>> don't insisting on
>>>>> an explicit zero we have the API wrong.
>>>>>
>>>>> I don't think you disagreed, just that the burden to get it correct
>>>>> lay elsewhere.
>>>>>
>>>>> If that is the case it doesn't really matter, we cannot release this
>>>>> interface until
>>>>> it is corrected wherever it must be.
>>>>>
>>>>> And my zero check was a poor hack.
>>>>>
>>>>> I should have explicitly checked the returned pointer value.
>>>>>
>>>>> I will send that proposed revision, and hopefully you will consider it
>>>>> for inclusion.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> >>
>>>>> >> >
>>>>> >> > Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
>>>>> >> > ---
>>>>> >> > Documentation/ABI/testing/sysfs-block-zram | 10 ++++++++
>>>>> >> > Documentation/blockdev/zram.txt | 24 ++++++++++++++---
>>>>> >> > drivers/block/zram/zram_drv.c | 41 ++++++++++++++++++++++++++++++
>>>>> >> > drivers/block/zram/zram_drv.h | 5 ++++
>>>>> >> > 4 files changed, 76 insertions(+), 4 deletions(-)
>>>>> >> >
>>>>> >> > diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
>>>>> >> > index 70ec992514d0..b8c779d64968 100644
>>>>> >> > --- a/Documentation/ABI/testing/sysfs-block-zram
>>>>> >> > +++ b/Documentation/ABI/testing/sysfs-block-zram
>>>>> >> > @@ -119,3 +119,13 @@ Description:
>>>>> >> > efficiency can be calculated using compr_data_size and this
>>>>> >> > statistic.
>>>>> >> > Unit: bytes
>>>>> >> > +
>>>>> >> > +What: /sys/block/zram<id>/mem_limit
>>>>> >> > +Date: August 2014
>>>>> >> > +Contact: Minchan Kim <minchan@xxxxxxxxxx>
>>>>> >> > +Description:
>>>>> >> > + The mem_limit file is read/write and specifies the amount
>>>>> >> > + of memory to be able to consume memory to store store
>>>>> >> > + compressed data. The limit could be changed in run time
>>>>> >> > - and "0" is default which means disable the limit.
>>>>> >> > + and "0" means disable the limit. No limit is the initial state.
>>>>> >>
>>>>> >> there should be no default in the API.
>>>>> >
>>>>> > Thanks.
>>>>> >
>>>>> >>
>>>>> >> > + Unit: bytes
>>>>> >> > diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
>>>>> >> > index 0595c3f56ccf..82c6a41116db 100644
>>>>> >> > --- a/Documentation/blockdev/zram.txt
>>>>> >> > +++ b/Documentation/blockdev/zram.txt
>>>>> >> > @@ -74,14 +74,30 @@ There is little point creating a zram of greater than twice the size of memory
>>>>> >> > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
>>>>> >> > size of the disk when not in use so a huge zram is wasteful.
>>>>> >> >
>>>>> >> > -5) Activate:
>>>>> >> > +5) Set memory limit: Optional
>>>>> >> > + Set memory limit by writing the value to sysfs node 'mem_limit'.
>>>>> >> > + The value can be either in bytes or you can use mem suffixes.
>>>>> >> > + In addition, you could change the value in runtime.
>>>>> >> > + Examples:
>>>>> >> > + # limit /dev/zram0 with 50MB memory
>>>>> >> > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
>>>>> >> > +
>>>>> >> > + # Using mem suffixes
>>>>> >> > + echo 256K > /sys/block/zram0/mem_limit
>>>>> >> > + echo 512M > /sys/block/zram0/mem_limit
>>>>> >> > + echo 1G > /sys/block/zram0/mem_limit
>>>>> >> > +
>>>>> >> > + # To disable memory limit
>>>>> >> > + echo 0 > /sys/block/zram0/mem_limit
>>>>> >> > +
>>>>> >> > +6) Activate:
>>>>> >> > mkswap /dev/zram0
>>>>> >> > swapon /dev/zram0
>>>>> >> >
>>>>> >> > mkfs.ext4 /dev/zram1
>>>>> >> > mount /dev/zram1 /tmp
>>>>> >> >
>>>>> >> > -6) Stats:
>>>>> >> > +7) Stats:
>>>>> >> > Per-device statistics are exported as various nodes under
>>>>> >> > /sys/block/zram<id>/
>>>>> >> > disksize
>>>>> >> > @@ -96,11 +112,11 @@ size of the disk when not in use so a huge zram is wasteful.
>>>>> >> > compr_data_size
>>>>> >> > mem_used_total
>>>>> >> >
>>>>> >> > -7) Deactivate:
>>>>> >> > +8) Deactivate:
>>>>> >> > swapoff /dev/zram0
>>>>> >> > umount /dev/zram1
>>>>> >> >
>>>>> >> > -8) Reset:
>>>>> >> > +9) Reset:
>>>>> >> > Write any positive value to 'reset' sysfs node
>>>>> >> > echo 1 > /sys/block/zram0/reset
>>>>> >> > echo 1 > /sys/block/zram1/reset
>>>>> >> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
>>>>> >> > index f0b8b30a7128..370c355eb127 100644
>>>>> >> > --- a/drivers/block/zram/zram_drv.c
>>>>> >> > +++ b/drivers/block/zram/zram_drv.c
>>>>> >> > @@ -122,6 +122,33 @@ static ssize_t max_comp_streams_show(struct device *dev,
>>>>> >> > return scnprintf(buf, PAGE_SIZE, "%d\n", val);
>>>>> >> > }
>>>>> >> >
>>>>> >> > +static ssize_t mem_limit_show(struct device *dev,
>>>>> >> > + struct device_attribute *attr, char *buf)
>>>>> >> > +{
>>>>> >> > + u64 val;
>>>>> >> > + struct zram *zram = dev_to_zram(dev);
>>>>> >> > +
>>>>> >> > + down_read(&zram->init_lock);
>>>>> >> > + val = zram->limit_pages;
>>>>> >> > + up_read(&zram->init_lock);
>>>>> >> > +
>>>>> >> > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
>>>>> >> > +}
>>>>> >> > +
>>>>> >> > +static ssize_t mem_limit_store(struct device *dev,
>>>>> >> > + struct device_attribute *attr, const char *buf, size_t len)
>>>>> >> > +{
>>>>> >> > + u64 limit;
>>>>> >> > + struct zram *zram = dev_to_zram(dev);
>>>>> >> > +
>>>>> >> > + limit = memparse(buf, NULL);
>>>>> >>
>>>>> >> if (limit = 0 && buf != "0")
>>>>> >> return -EINVAL
>>>>> >>
>>>>> >> > + down_write(&zram->init_lock);
>>>>> >> > + zram->limit_pages = PAGE_ALIGN(limit) >> PAGE_SHIFT;
>>>>> >> > + up_write(&zram->init_lock);
>>>>> >> > +
>>>>> >> > + return len;
>>>>> >> > +}
>>>>> >> > +
>>>>> >> > static ssize_t max_comp_streams_store(struct device *dev,
>>>>> >> > struct device_attribute *attr, const char *buf, size_t len)
>>>>> >> > {
>>>>> >> > @@ -513,6 +540,14 @@ static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
>>>>> >> > ret = -ENOMEM;
>>>>> >> > goto out;
>>>>> >> > }
>>>>> >> > +
>>>>> >> > + if (zram->limit_pages &&
>>>>> >> > + zs_get_total_pages(meta->mem_pool) > zram->limit_pages) {
>>>>> >> > + zs_free(meta->mem_pool, handle);
>>>>> >> > + ret = -ENOMEM;
>>>>> >> > + goto out;
>>>>> >> > + }
>>>>> >> > +
>>>>> >> > cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_WO);
>>>>> >> >
>>>>> >> > if ((clen == PAGE_SIZE) && !is_partial_io(bvec)) {
>>>>> >> > @@ -617,6 +652,9 @@ static void zram_reset_device(struct zram *zram, bool reset_capacity)
>>>>> >> > struct zram_meta *meta;
>>>>> >> >
>>>>> >> > down_write(&zram->init_lock);
>>>>> >> > +
>>>>> >> > + zram->limit_pages = 0;
>>>>> >> > +
>>>>> >> > if (!init_done(zram)) {
>>>>> >> > up_write(&zram->init_lock);
>>>>> >> > return;
>>>>> >> > @@ -857,6 +895,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
>>>>> >> > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
>>>>> >> > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>>>>> >> > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
>>>>> >> > +static DEVICE_ATTR(mem_limit, S_IRUGO | S_IWUSR, mem_limit_show,
>>>>> >> > + mem_limit_store);
>>>>> >> > static DEVICE_ATTR(max_comp_streams, S_IRUGO | S_IWUSR,
>>>>> >> > max_comp_streams_show, max_comp_streams_store);
>>>>> >> > static DEVICE_ATTR(comp_algorithm, S_IRUGO | S_IWUSR,
>>>>> >> > @@ -885,6 +925,7 @@ static struct attribute *zram_disk_attrs[] = {
>>>>> >> > &dev_attr_orig_data_size.attr,
>>>>> >> > &dev_attr_compr_data_size.attr,
>>>>> >> > &dev_attr_mem_used_total.attr,
>>>>> >> > + &dev_attr_mem_limit.attr,
>>>>> >> > &dev_attr_max_comp_streams.attr,
>>>>> >> > &dev_attr_comp_algorithm.attr,
>>>>> >> > NULL,
>>>>> >> > diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
>>>>> >> > index e0f725c87cc6..b7aa9c21553f 100644
>>>>> >> > --- a/drivers/block/zram/zram_drv.h
>>>>> >> > +++ b/drivers/block/zram/zram_drv.h
>>>>> >> > @@ -112,6 +112,11 @@ struct zram {
>>>>> >> > u64 disksize; /* bytes */
>>>>> >> > int max_comp_streams;
>>>>> >> > struct zram_stats stats;
>>>>> >> > + /*
>>>>> >> > + * the number of pages zram can consume for storing compressed data
>>>>> >> > + */
>>>>> >> > + unsigned long limit_pages;
>>>>> >> > +
>>>>> >> > char compressor[10];
>>>>> >> > };
>>>>> >> > #endif
>>>>> >> > --
>>>>> >> > 2.0.0
>>>>> >> >
>>>>> >>
>>>>> >> --
>>>>> >> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>> >> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
>>>>> >> see: http://www.linux-mm.org/ .
>>>>> >> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
>>>>> >
>>>>> > --
>>>>> > Kind regards,
>>>>> > Minchan Kim
>>>>>
>>>>> --
>>>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>>>> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
>>>>> see: http://www.linux-mm.org/ .
>>>>> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
>>>>
>>>> --
>>>> Kind regards,
>>>> Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/