Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0
From: Javier Gonzalez
Date: Mon Jun 04 2018 - 06:21:47 EST
> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin <marcin.dziegielewski@xxxxxxxxx> wrote:
>
>
> Frist of all I want to say sorry for late response - I was on holiday.
>
>> From: Javier Gonzalez [mailto:javier@xxxxxxxxxxxx]
>> Sent: Monday, May 28, 2018 1:03 PM
>> To: Matias BjÃrling <mb@xxxxxxxxxxx>
>> Cc: Jens Axboe <axboe@xxxxxx>; linux-block@xxxxxxxxxxxxxxx; linux-
>> kernel@xxxxxxxxxxxxxxx; Dziegielewski, Marcin
>> <marcin.dziegielewski@xxxxxxxxx>; Konopko, Igor J
>> <igor.j.konopko@xxxxxxxxx>
>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
>> equals to 0
>>
>>> On 28 May 2018, at 10.58, Matias BjÃrling <mb@xxxxxxxxxxx> wrote:
>>>
>>> From: Marcin Dziegielewski <marcin.dziegielewski@xxxxxxxxx>
>>>
>>> Some devices can expose mw_cunits equal to 0, it can cause creation of
>>> too small write buffer and cause performance to drop on write
>>> workloads.
>>>
>>> To handle that, we use the default value for MLC and beacause it
>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
>>> nvme_nvm_setup_12 function isn't longer necessary.
>>>
>>> Signed-off-by: Marcin Dziegielewski <marcin.dziegielewski@xxxxxxxxx>
>>> Signed-off-by: Igor Konopko <igor.j.konopko@xxxxxxxxx>
>>> Signed-off-by: Matias BjÃrling <mb@xxxxxxxxxxx>
>>> ---
>>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
>>> drivers/nvme/host/lightnvm.c | 1 -
>>> 2 files changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/lightnvm/pblk-init.c
>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b 100644
>>> --- a/drivers/lightnvm/pblk-init.c
>>> +++ b/drivers/lightnvm/pblk-init.c
>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
>>> atomic64_set(&pblk->nr_flush, 0);
>>> pblk->nr_flush_rst = 0;
>>>
>>> - pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>> + if (geo->mw_cunits) {
>>> + pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
>>> + } else {
>>> + pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo->all_luns;
>>> + /*
>>> + * Some devices can expose mw_cunits equal to 0, so let's
>> use
>>> + * here default safe value for MLC.
>>> + */
>>> + }
>>>
>>> pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
>>> max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
>>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c index
>>> 41279da799ed..c747792da915 100644
>>> --- a/drivers/nvme/host/lightnvm.c
>>> +++ b/drivers/nvme/host/lightnvm.c
>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
>> nvme_nvm_id12
>>> *id,
>>>
>>> geo->ws_min = sec_per_pg;
>>> geo->ws_opt = sec_per_pg;
>>> - geo->mw_cunits = geo->ws_opt << 3; /* default to MLC safe values
>> */
>>> /* Do not impose values for maximum number of open blocks as it is
>>> * unspecified in 1.2. Users of 1.2 must be aware of this and
>>> eventually
>>> --
>>> 2.11.0
>>
>> By doing this, 1.2 future users (beyond pblk), will fail to have a valid
>> mw_cunits value. It's ok to deal with the 0 case in pblk, but I believe that we
>> should have the default value for 1.2 either way.
>
> I'm not sure. From my understanding, setting of default value was
> workaround for pblk case, am I right ?.
The default value covers the MLC case directly at the lightnvm layer, as
opposed to doing it directly in pblk. Since pblk is the only user now,
you can argue that all changes in the lightnvm layer are to solve pblk
issues, but the idea is that the geometry should be generic.
> In my opinion any user of 1.2
> spec should be aware that there is not mw_cunit value. From my point
> of view, leaving here 0 (and decision what do with it to lightnvm
> user) is more safer way, but maybe I'm wrong. I believe that it is
> topic to wider discussion with maintainers.
>
1.2 and 2.0 have different geometries, but when we designed the common
nvm_geo structure, the idea was to abstract both specs and allow the
upper layers to use the geometry transparently.
Specifically in pblk, I would prefer to keep it in such a way that we don't
need to media specific policies (e.g., set default values for MLC
memories), as a general design principle. We already do some geometry
version checks to avoid dereferencing unnecessary pointers on the fast
path, which I would eventually like to remove.
>> A more generic way of doing this would be to have a default value for
>> 2.0 too, in case mw_cunits is reported as 0.
>
> Since 0 is correct value and users can make different decisions based
> on it, I think we shouldn't overwrite it by default value. Is it make
> sense?
Here I meant at a pblk level - I should have specified it. At the
geometry level, we should not change it.
The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In
this case, we still need a host side buffer to serve < ws_min I/Os, even
though the device does not require the buffer to guarantee reads.
>> Javier
>
> Thanks,
> Marcin
Javier