Re: [PATCH v2] libnvdimm, dimm: Maximize label transfer size

From: Alexander Duyck
Date: Tue Oct 02 2018 - 13:07:31 EST




On 10/1/2018 3:02 PM, Dan Williams wrote:
On Mon, Oct 1, 2018 at 2:54 PM Alexander Duyck
<alexander.h.duyck@xxxxxxxxxxxxxxx> wrote:



On 10/1/2018 2:14 PM, Dan Williams wrote:
Use kvzalloc() to bypass the arbitrary PAGE_SIZE limit of label transfer
operations. Given the expense of calling into firmware, maximize the
amount of label data we transfer per call to be up to the total label
space if allowed by the firmware, or 256K whichever is smaller.

Cc: Alexander Duyck <alexander.h.duyck@xxxxxxxxxxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
---
Changes in v2:
* clamp the max allocation size at 256K in case large label areas with
unlimited transfer sizes appear in the future.

drivers/nvdimm/dimm_devs.c | 14 ++++++++------
tools/testing/nvdimm/test/nfit.c | 2 +-
2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c
index 863cabc35215..3616e2e47788 100644
--- a/drivers/nvdimm/dimm_devs.c
+++ b/drivers/nvdimm/dimm_devs.c
@@ -111,8 +111,9 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
if (!ndd->data)
return -ENOMEM;

- max_cmd_size = min_t(u32, PAGE_SIZE, ndd->nsarea.max_xfer);
- cmd = kzalloc(max_cmd_size + sizeof(*cmd), GFP_KERNEL);
+ max_cmd_size = min_t(u32, ndd->nsarea.config_size, SZ_256K);
+ max_cmd_size = min_t(u32, max_cmd_size, ndd->nsarea.max_xfer);
+ cmd = kvzalloc(max_cmd_size + sizeof(*cmd), GFP_KERNEL);
if (!cmd)
return -ENOMEM;


So I wouldn't use 256K as the limit, maybe 256K minus the sizeof(*cmd).
Otherwise you are still allocating additional memory to take care of
that little trailing bit that is being added.

Does it matter? This is a slow / infrequently used path and I do don't
see the practical difference of 256K vs slightly less than 256K.

It depends on the approach used. From past experience 256K could easily become 512K with just that extra bit of overhead. That is why I was thinking if we are going to make 256K the limit, we should make that the hard limit and not add a little bit extra onto it.

@@ -134,7 +135,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
memcpy(ndd->data + offset, cmd->out_buf, cmd->in_length);
}
dev_dbg(ndd->dev, "len: %zu rc: %d\n", offset, rc);
- kfree(cmd);
+ kvfree(cmd);

return rc;
}
@@ -157,9 +158,10 @@ int nvdimm_set_config_data(struct nvdimm_drvdata *ndd, size_t offset,
if (offset + len > ndd->nsarea.config_size)
return -ENXIO;

- max_cmd_size = min_t(u32, PAGE_SIZE, len);
+ max_cmd_size = min_t(u32, ndd->nsarea.config_size, SZ_256K);
max_cmd_size = min_t(u32, max_cmd_size, ndd->nsarea.max_xfer);
- cmd = kzalloc(max_cmd_size + sizeof(*cmd) + sizeof(u32), GFP_KERNEL);
+ max_cmd_size = min_t(u32, max_cmd_size, len);
+ cmd = kvzalloc(max_cmd_size + sizeof(*cmd) + sizeof(u32), GFP_KERNEL);
if (!cmd)
return -ENOMEM;


For the set operation I am not sure you have any code that is going to
be updating things multiple labels at a time. From what I can tell the
largest set call you ever make is probably for a namespace index and
odds are that will only ever be 256 or 512 bytes.

Inside the kernel, true, but we do perform large sets from userspace.
That said I don't see why this low level routine should encode
layering violation knowledge of how it might be used.

Can userspace call this directly? I only see 3 callers of this and all of them limit themselves to writing either a single namespace index or label.

Also we know that the behavior is supposed to be that we only update what we have to as it introduces issues if we try to overwrite all of the config space. That is why I think it would be better to keep the upper limit for writes small anyway. That way we make it painful for somebody to do the wrong thing.

Also the limitations here could probably use some additional clean-up.
For example you have a check for offset + len > config_size above this
min_t calls. As such it should be impossible for length to ever be
greater than config_size so you shouldn't need to test for the min of
both and could just use the min of len versus the max_xfer.

Again that's a case of this leaf routine encoding assumptions about
how it might be used. I'd rather be pedantic since this is not a hot
path.

No. This is me reading the code. Just to be clear, before we start trying to determine the max_cmd_size we have the following bit of code:
if (offset + len > ndd->nsarea.config_size)
return -ENXIO;

So if that logic is already there how can we have len be greater than ndd->nsarea.config_size? As far as I can tell we can't so we could save ourselves one of the min_t checks since we know len should always be less than or equal to ndd->nsarea.config_size.