RE: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits equals to 0

From: Dziegielewski, Marcin
Date: Mon Jun 04 2018 - 13:18:04 EST


> From: Javier Gonzalez [mailto:javier@xxxxxxxxxxxx]
> Sent: Monday, June 4, 2018 1:16 PM
> To: Dziegielewski, Marcin <marcin.dziegielewski@xxxxxxxxx>
> Cc: Matias BjÃrling <mb@xxxxxxxxxxx>; Jens Axboe <axboe@xxxxxx>; linux-
> block@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Konopko, Igor J
> <igor.j.konopko@xxxxxxxxx>
> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when mw_cunits
> equals to 0
>
>
> > On 4 Jun 2018, at 13.11, Dziegielewski, Marcin
> <marcin.dziegielewski@xxxxxxxxx> wrote:
> >
> >> -----Original Message-----
> >> From: Javier Gonzalez [mailto:javier@xxxxxxxxxxxx]
> >> Sent: Monday, June 4, 2018 12:22 PM
> >> To: Dziegielewski, Marcin <marcin.dziegielewski@xxxxxxxxx>
> >> Cc: Matias BjÃrling <mb@xxxxxxxxxxx>; Jens Axboe <axboe@xxxxxx>;
> >> linux- block@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Konopko,
> >> Igor J <igor.j.konopko@xxxxxxxxx>
> >> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >> mw_cunits equals to 0
> >>
> >>> On 4 Jun 2018, at 12.09, Dziegielewski, Marcin
> >> <marcin.dziegielewski@xxxxxxxxx> wrote:
> >>> Frist of all I want to say sorry for late response - I was on holiday.
> >>>
> >>>> From: Javier Gonzalez [mailto:javier@xxxxxxxxxxxx]
> >>>> Sent: Monday, May 28, 2018 1:03 PM
> >>>> To: Matias BjÃrling <mb@xxxxxxxxxxx>
> >>>> Cc: Jens Axboe <axboe@xxxxxx>; linux-block@xxxxxxxxxxxxxxx; linux-
> >>>> kernel@xxxxxxxxxxxxxxx; Dziegielewski, Marcin
> >>>> <marcin.dziegielewski@xxxxxxxxx>; Konopko, Igor J
> >>>> <igor.j.konopko@xxxxxxxxx>
> >>>> Subject: Re: [GIT PULL 18/20] lightnvm: pblk: handle case when
> >>>> mw_cunits equals to 0
> >>>>
> >>>>> On 28 May 2018, at 10.58, Matias BjÃrling <mb@xxxxxxxxxxx> wrote:
> >>>>>
> >>>>> From: Marcin Dziegielewski <marcin.dziegielewski@xxxxxxxxx>
> >>>>>
> >>>>> Some devices can expose mw_cunits equal to 0, it can cause
> >>>>> creation of too small write buffer and cause performance to drop
> >>>>> on write workloads.
> >>>>>
> >>>>> To handle that, we use the default value for MLC and beacause it
> >>>>> covers both 1.2 and 2.0 OC specification, setting up mw_cunits in
> >>>>> nvme_nvm_setup_12 function isn't longer necessary.
> >>>>>
> >>>>> Signed-off-by: Marcin Dziegielewski
> >>>>> <marcin.dziegielewski@xxxxxxxxx>
> >>>>> Signed-off-by: Igor Konopko <igor.j.konopko@xxxxxxxxx>
> >>>>> Signed-off-by: Matias BjÃrling <mb@xxxxxxxxxxx>
> >>>>> ---
> >>>>> drivers/lightnvm/pblk-init.c | 10 +++++++++-
> >>>>> drivers/nvme/host/lightnvm.c | 1 -
> >>>>> 2 files changed, 9 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/lightnvm/pblk-init.c
> >>>>> b/drivers/lightnvm/pblk-init.c index d65d2f972ccf..0f277744266b
> >>>>> 100644
> >>>>> --- a/drivers/lightnvm/pblk-init.c
> >>>>> +++ b/drivers/lightnvm/pblk-init.c
> >>>>> @@ -356,7 +356,15 @@ static int pblk_core_init(struct pblk *pblk)
> >>>>> atomic64_set(&pblk->nr_flush, 0);
> >>>>> pblk->nr_flush_rst = 0;
> >>>>>
> >>>>> - pblk->pgs_in_buffer = geo->mw_cunits * geo->all_luns;
> >>>>> + if (geo->mw_cunits) {
> >>>>> + pblk->pgs_in_buffer = geo->mw_cunits * geo-
> >all_luns;
> >>>>> + } else {
> >>>>> + pblk->pgs_in_buffer = (geo->ws_opt << 3) * geo-
> >all_luns;
> >>>>> + /*
> >>>>> + * Some devices can expose mw_cunits equal to 0, so
> let's
> >>>> use
> >>>>> + * here default safe value for MLC.
> >>>>> + */
> >>>>> + }
> >>>>>
> >>>>> pblk->min_write_pgs = geo->ws_opt * (geo->csecs / PAGE_SIZE);
> >>>>> max_write_ppas = pblk->min_write_pgs * geo->all_luns; diff --git
> >>>>> a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
> >>>>> index
> >>>>> 41279da799ed..c747792da915 100644
> >>>>> --- a/drivers/nvme/host/lightnvm.c
> >>>>> +++ b/drivers/nvme/host/lightnvm.c
> >>>>> @@ -338,7 +338,6 @@ static int nvme_nvm_setup_12(struct
> >>>> nvme_nvm_id12
> >>>>> *id,
> >>>>>
> >>>>> geo->ws_min = sec_per_pg;
> >>>>> geo->ws_opt = sec_per_pg;
> >>>>> - geo->mw_cunits = geo->ws_opt << 3; /* default to MLC
> safe values
> >>>> */
> >>>>> /* Do not impose values for maximum number of open blocks as it is
> >>>>> * unspecified in 1.2. Users of 1.2 must be aware of this and
> >>>>> eventually
> >>>>> --
> >>>>> 2.11.0
> >>>>
> >>>> By doing this, 1.2 future users (beyond pblk), will fail to have a
> >>>> valid mw_cunits value. It's ok to deal with the 0 case in pblk, but
> >>>> I believe that we should have the default value for 1.2 either way.
> >>>
> >>> I'm not sure. From my understanding, setting of default value was
> >>> workaround for pblk case, am I right ?.
> >>
> >> The default value covers the MLC case directly at the lightnvm layer,
> >> as opposed to doing it directly in pblk. Since pblk is the only user
> >> now, you can argue that all changes in the lightnvm layer are to
> >> solve pblk issues, but the idea is that the geometry should be generic.
> >>
> >>> In my opinion any user of 1.2
> >>> spec should be aware that there is not mw_cunit value. From my point
> >>> of view, leaving here 0 (and decision what do with it to lightnvm
> >>> user) is more safer way, but maybe I'm wrong. I believe that it is
> >>> topic to wider discussion with maintainers.
> >>
> >> 1.2 and 2.0 have different geometries, but when we designed the
> >> common nvm_geo structure, the idea was to abstract both specs and
> >> allow the upper layers to use the geometry transparently.
> >>
> >> Specifically in pblk, I would prefer to keep it in such a way that we
> >> don't need to media specific policies (e.g., set default values for
> >> MLC memories), as a general design principle. We already do some
> >> geometry version checks to avoid dereferencing unnecessary pointers
> >> on the fast path, which I would eventually like to remove.
> >
> > Ok, now I understand your point of view and agree with that, I will
> > prepare second version of this patch without this change.
>
> Sounds good.
>
> > Thanks for
> > the clarification.
> >
>
> Sure :)
>
> >>>> A more generic way of doing this would be to have a default value
> >>>> for
> >>>> 2.0 too, in case mw_cunits is reported as 0.
> >>>
> >>> Since 0 is correct value and users can make different decisions
> >>> based on it, I think we shouldn't overwrite it by default value. Is
> >>> it make sense?
> >>
> >> Here I meant at a pblk level - I should have specified it. At the
> >> geometry level, we should not change it.
> >>
> >> The case I am thinking is if mw_cuints repoints 0, but ws_min > 0. In
> >> this case, we still need a host side buffer to serve < ws_min I/Os,
> >> even though the device does not require the buffer to guarantee reads.
> >
> > Oh, ok now we are on the same page. In this patch I was trying to
> > address such case. Do you have other idea how to do it or here are you
> > thinking only on value of default variable?
>
> If doing this, I guess that something in the line of what you did with
> increasing the size of the write buffer via a module parameter. For example,
> checking if the size of the write buffer based on mw_cuints is enough to
> cover ws_min, which normally would only be an issue when mw_cuints == 0
> or when the number of PUs used for the pblk instance is very small and
> mw_cuints < nr_luns * ws_min.


I see here two cases:
- when mw_cunits > 0 buffer size should have number of entries at least max(mw_cunits, ws_min) * nr_luns and here we are taking care of both cases mw_cunits > ws_min and mw_cunits < ws_min.
- when mw_cunit == 0 buffer size should have number of entries at least ws_min * nr _luns and we can use the same puseudocode as above.

Do you see any other case? Could you clarify second case mentioned by you or maybe did you mean opposite case? If yes, I believe that above pseudo code will handle such case too.

>
> >
> >>>> Javier
> >>>
> >>> Thanks,
> >>> Marcin
> >>
> >> Javier
> > Thanks,
> > Marcin
Thanks!,
Marcin