Re: [PATCH 1/6] lightnvm: pblk: check for failed mempool alloc.

From: Javier GonzÃlez
Date: Wed Sep 06 2017 - 14:29:18 EST


> On 6 Sep 2017, at 17.20, Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> On 09/06/2017 09:13 AM, Jens Axboe wrote:
>> On 09/06/2017 09:12 AM, Javier GonzÃlez wrote:
>>>> On 6 Sep 2017, at 17.09, Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>>
>>>> On 09/06/2017 09:08 AM, Johannes Thumshirn wrote:
>>>>> On Wed, Sep 06, 2017 at 05:01:01PM +0200, Javier GonzÃlez wrote:
>>>>>> Check for failed mempool allocations and act accordingly.
>>>>>
>>>>> Are you sure it is needed? Quoting from mempool_alloc()s Documentation:
>>>>> "[...] Note that due to preallocation, this function *never* fails when called
>>>>> from process contexts. (it might fail if called from an IRQ context.) [...]"
>>>>
>>>> It's not needed, mempool() will never fail if __GFP_WAIT is set in the
>>>> mask. The use case here is GFP_KERNEL, which does include __GFP_WAIT.
>>>
>>> Thanks for the clarification. Do you just drop the patch, or do you want
>>> me to re-send the series?
>>
>> No need to resend. I'll pick up the others in a day or two, once people
>> have had some time to go over them.
>
> I took a quick look at your mempool usage, and I'm not sure it's
> correct. For a mempool to work, you have to ensure that you provide a
> forward progress guarantee. With that guarantee, you know that if you do
> end up sleeping on allocation, you already have items inflight that will
> be freed when that operation completes. In other words, all allocations
> must have a defined and finite life time, as any allocation can
> potentially sleep/block for that life time. You can't allocate something
> and hold on to it forever, then you are violating the terms of agreement
> that makes a mempool work.

I understood the part of guaranteeing the number of inflight items to
keep the mempool active without waiting, but I must admit that I assumed
that the mempool would resize when getting pressure and that the penalty
would be increased latency, not the mempool giving up and causing a
deadlock.

>
> The first one that caught my eye is pblk->page_pool. You have this loop:
>
> for (i = 0; i < nr_pages; i++) {
> page = mempool_alloc(pblk->page_pool, flags);
> if (!page)
> goto err;
>
> ret = bio_add_pc_page(q, bio, page, PBLK_EXPOSED_PAGE_SIZE, 0);
> if (ret != PBLK_EXPOSED_PAGE_SIZE) {
> pr_err("pblk: could not add page to bio\n");
> mempool_free(page, pblk->page_pool);
> goto err;
> }
> }
>
> which looks suspect. This mempool is created with a reserve pool of
> PAGE_POOL_SIZE (16) members. Do we know if the bio has 16 pages or less?
> If not, then this is broken and can deadlock forever.

I can see that in this case, the 16 elements do not hold. In the read
path, we can guarantee that a read will be <= 64 sectors (4KB pages), so
this is definitely wrong. I'll fix it tomorrow.

Since we are at it, I have for some time wondered what's the right way
to balance the mempool sizes so that we are a good citizen and perform
at the same time. I don't see a lot of code using mempool_resize to tune
the min_nr based on load. For example, in our write path, the numbers
are easy to calculate, but on the read path I completely
over-dimensioned the mempool to ensure not having to wait for the
completion path. Any good rule of thumb here?

> You have a lot of mempool usage in the code, would probably not hurt to
> audit all of them.

Yes. I will take a look and add comments to the sizes.

Thanks Jens,
Javier

Attachment: signature.asc
Description: Message signed with OpenPGP