Re: [PATCH V2] lightnvm: pblk: prevent stall due to wb threshold

From: Hans Holmberg
Date: Thu Jan 31 2019 - 15:10:42 EST


On Thu, Jan 31, 2019 at 5:33 PM Javier GonzÃlez <javier@xxxxxxxxxxx> wrote:
>
>
>
> > On 31 Jan 2019, at 11.41, Hans Holmberg <hans@xxxxxxxxxxxxx> wrote:
> >
> > Hi Javier!
> >
> > How did you test this? I'm trying to add a test case to our testing framework.
> >
> > This is what i ran in qemu, and I got a hang (with this version of the patch)
> >
> > nvme lnvm create -d nvme0n1 -t pblk -n pblk0 -f -b 0 -e 0
>
> I run several low configurations without problem. Can you share the qemu configuration and version?
>

Of course!

qemu remote: https://github.com/CNEX-Labs/qemu-nvme.git
branch: master (cb200e3ccf9c1ff21f6275c6cb68b2801135a640)

My geometry:
parallel units: 8, secs per chk: 768, meta size 16, ws min 12, ws opt
24, cunits=0

> Iâm on travel until Friday - Iâll come back to you over the weekend.

Oh, no worries. This patch did not introduce the issue, so it's not a
regression, but it looks like the same type of hang as the patch
addresses.
I might have stumbled across another bug.

One thing I noticed was that the dd write block size that triggered
the hang is >= write buffer. (128 entries * 4k)

Safe travels!

>
> >
> > kernel log: [ 116.381799] pblk pblk0: luns:1, lines:280, secs:212736,
> > buf entries:128
> >
> > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=4k count=1
> > 1+0 records in
> > 1+0 records out
> > 4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000480941 s, 8.5 MB/s
> > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=64k count=1
> > 1+0 records in
> > 1+0 records out
> > 65536 bytes (66 kB, 64 KiB) copied, 0.000477373 s, 137 MB/s
> > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=128k count=1
> > 1+0 records in
> > 1+0 records out
> > 131072 bytes (131 kB, 128 KiB) copied, 0.000548722 s, 239 MB/s
> > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=256k count=1
> > 1+0 records in
> > 1+0 records out
> > 262144 bytes (262 kB, 256 KiB) copied, 0.000718515 s, 365 MB/s
> > # dd if=/dev/zero of=/dev/pblk0 oflag=direct bs=512k count=1
> > <HANG>
>
>
> >
> >
> >> On Wed, Jan 30, 2019 at 11:28 AM Javier GonzÃlez <javier@xxxxxxxxxxx> wrote:
> >>
> >> In order to respect mw_cuinits, pblk's write buffer maintains a
> >> backpointer to protect data not yet persisted; when writing to the write
> >> buffer, this backpointer defines a threshold that pblk's rate-limiter
> >> enforces.
> >>
> >> On small PU configurations, the following scenarios might take place: (i)
> >> the threshold is larger than the write buffer and (ii) the threshold is
> >> smaller than the write buffer, but larger than the maximun allowed
> >> split bio - 256KB at this moment (Note that writes are not always
> >> split - we only do this when we the size of the buffer is smaller
> >> than the buffer). In both cases, pblk's rate-limiter prevents the I/O to
> >> be written to the buffer, thus stalling.
> >>
> >> This patch fixes the original backpointer implementation by considering
> >> the threshold both on buffer creation and on the rate-limiters path,
> >> when bio_split is triggered (case (ii) above).
> >>
> >> Fixes: 766c8ceb16fc ("lightnvm: pblk: guarantee that backpointer is respected on writer stall")
> >> Signed-off-by: Javier GonzÃlez <javier@xxxxxxxxxxx>
> >> ---
> >>
> >> Changes since V1:
> >> - Fix a bad arithmetinc on the rate-limiter max_io calculation (from
> >> Hans)
> >>
> >> drivers/lightnvm/pblk-rb.c | 25 +++++++++++++++++++------
> >> drivers/lightnvm/pblk-rl.c | 5 ++---
> >> drivers/lightnvm/pblk.h | 2 +-
> >> 3 files changed, 22 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
> >> index d4ca8c64ee0f..a6133b50ed9c 100644
> >> --- a/drivers/lightnvm/pblk-rb.c
> >> +++ b/drivers/lightnvm/pblk-rb.c
> >> @@ -45,10 +45,23 @@ void pblk_rb_free(struct pblk_rb *rb)
> >> /*
> >> * pblk_rb_calculate_size -- calculate the size of the write buffer
> >> */
> >> -static unsigned int pblk_rb_calculate_size(unsigned int nr_entries)
> >> +static unsigned int pblk_rb_calculate_size(unsigned int nr_entries,
> >> + unsigned int threshold)
> >> {
> >> - /* Alloc a write buffer that can at least fit 128 entries */
> >> - return (1 << max(get_count_order(nr_entries), 7));
> >> + unsigned int thr_sz = 1 << (get_count_order(threshold + NVM_MAX_VLBA));
> >> + unsigned int max_sz = max(thr_sz, nr_entries);
> >> + unsigned int max_io;
> >> +
> >> + /* Alloc a write buffer that can (i) fit at least two split bios
> >> + * (considering max I/O size NVM_MAX_VLBA, and (ii) guarantee that the
> >> + * threshold will be respected
> >> + */
> >> + max_io = (1 << max((int)(get_count_order(max_sz)),
> >> + (int)(get_count_order(NVM_MAX_VLBA << 1))));
> >> + if ((threshold + NVM_MAX_VLBA) >= max_io)
> >> + max_io <<= 1;
> >> +
> >> + return max_io;
> >> }
> >>
> >> /*
> >> @@ -67,12 +80,12 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold,
> >> unsigned int alloc_order, order, iter;
> >> unsigned int nr_entries;
> >>
> >> - nr_entries = pblk_rb_calculate_size(size);
> >> + nr_entries = pblk_rb_calculate_size(size, threshold);
> >> entries = vzalloc(array_size(nr_entries, sizeof(struct pblk_rb_entry)));
> >> if (!entries)
> >> return -ENOMEM;
> >>
> >> - power_size = get_count_order(size);
> >> + power_size = get_count_order(nr_entries);
> >> power_seg_sz = get_count_order(seg_size);
> >>
> >> down_write(&pblk_rb_lock);
> >> @@ -149,7 +162,7 @@ int pblk_rb_init(struct pblk_rb *rb, unsigned int size, unsigned int threshold,
> >> * Initialize rate-limiter, which controls access to the write buffer
> >> * by user and GC I/O
> >> */
> >> - pblk_rl_init(&pblk->rl, rb->nr_entries);
> >> + pblk_rl_init(&pblk->rl, rb->nr_entries, threshold);
> >>
> >> return 0;
> >> }
> >> diff --git a/drivers/lightnvm/pblk-rl.c b/drivers/lightnvm/pblk-rl.c
> >> index 76116d5f78e4..e9e0af0df165 100644
> >> --- a/drivers/lightnvm/pblk-rl.c
> >> +++ b/drivers/lightnvm/pblk-rl.c
> >> @@ -207,7 +207,7 @@ void pblk_rl_free(struct pblk_rl *rl)
> >> del_timer(&rl->u_timer);
> >> }
> >>
> >> -void pblk_rl_init(struct pblk_rl *rl, int budget)
> >> +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold)
> >> {
> >> struct pblk *pblk = container_of(rl, struct pblk, rl);
> >> struct nvm_tgt_dev *dev = pblk->dev;
> >> @@ -217,7 +217,6 @@ void pblk_rl_init(struct pblk_rl *rl, int budget)
> >> int sec_meta, blk_meta;
> >> unsigned int rb_windows;
> >>
> >> -
> >> /* Consider sectors used for metadata */
> >> sec_meta = (lm->smeta_sec + lm->emeta_sec[0]) * l_mg->nr_free_lines;
> >> blk_meta = DIV_ROUND_UP(sec_meta, geo->clba);
> >> @@ -234,7 +233,7 @@ void pblk_rl_init(struct pblk_rl *rl, int budget)
> >> /* To start with, all buffer is available to user I/O writers */
> >> rl->rb_budget = budget;
> >> rl->rb_user_max = budget;
> >> - rl->rb_max_io = budget >> 1;
> >> + rl->rb_max_io = budget - threshold;
> >> rl->rb_gc_max = 0;
> >> rl->rb_state = PBLK_RL_HIGH;
> >>
> >> diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h
> >> index 72ae8755764e..a6386d5acd73 100644
> >> --- a/drivers/lightnvm/pblk.h
> >> +++ b/drivers/lightnvm/pblk.h
> >> @@ -924,7 +924,7 @@ int pblk_gc_sysfs_force(struct pblk *pblk, int force);
> >> /*
> >> * pblk rate limiter
> >> */
> >> -void pblk_rl_init(struct pblk_rl *rl, int budget);
> >> +void pblk_rl_init(struct pblk_rl *rl, int budget, int threshold);
> >> void pblk_rl_free(struct pblk_rl *rl);
> >> void pblk_rl_update_rates(struct pblk_rl *rl);
> >> int pblk_rl_high_thrs(struct pblk_rl *rl);
> >> --
> >> 2.17.1
> >>