Re: [PATCH 4/5] lightnvm: pblk: add padding distribution sysfs attribute
From: Hans Holmberg
Date: Tue Feb 06 2018 - 06:28:55 EST
On Tue, Feb 6, 2018 at 11:50 AM, Matias BjÃrling <mb@xxxxxxxxxxx> wrote:
> On 02/06/2018 10:27 AM, Hans Holmberg wrote:
>>
>> Hi Matias,
>>
>> On Wed, Jan 31, 2018 at 9:44 AM, Matias BjÃrling <mb@xxxxxxxxxxx> wrote:
>>>
>>> On 01/31/2018 03:06 AM, Javier GonzÃlez wrote:
>>>>
>>>>
>>>> From: Hans Holmberg <hans.holmberg@xxxxxxxxxxxx>
>>>>
>>>> When pblk receives a sync, all data up to that point in the write buffer
>>>> must be comitted to persistent storage, and as flash memory comes with a
>>>> minimal write size there is a significant cost involved both in terms
>>>> of time for completing the sync and in terms of write amplification
>>>> padded sectors for filling up to the minimal write size.
>>>>
>>>> In order to get a better understanding of the costs involved for syncs,
>>>> Add a sysfs attribute to pblk: padded_dist, showing a normalized
>>>> distribution of sectors padded. In order to facilitate measurements of
>>>> specific workloads during the lifetime of the pblk instance, the
>>>> distribution can be reset by writing 0 to the attribute.
>>>>
>>>> Do this by introducing counters for each possible padding:
>>>> {0..(minimal write size - 1)} and calculate the normalized distribution
>>>> when showing the attribute.
>>>>
>>>> Signed-off-by: Hans Holmberg <hans.holmberg@xxxxxxxxxxxx>
>>>> Signed-off-by: Javier GonzÃlez <javier@xxxxxxxxxxxx>
>>>> ---
>>>> drivers/lightnvm/pblk-init.c | 16 ++++++++--
>>>> drivers/lightnvm/pblk-rb.c | 15 +++++-----
>>>> drivers/lightnvm/pblk-sysfs.c | 68
>>>> +++++++++++++++++++++++++++++++++++++++++++
>>>> drivers/lightnvm/pblk.h | 6 +++-
>>>> 4 files changed, 95 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c
>>>> index 7eedc5daebc8..a5e3510c0f38 100644
>>>> --- a/drivers/lightnvm/pblk-init.c
>>>> +++ b/drivers/lightnvm/pblk-init.c
>>>> @@ -921,6 +921,7 @@ static void pblk_free(struct pblk *pblk)
>>>> {
>>>> pblk_luns_free(pblk);
>>>> pblk_lines_free(pblk);
>>>> + kfree(pblk->pad_dist);
>>>> pblk_line_meta_free(pblk);
>>>> pblk_core_free(pblk);
>>>> pblk_l2p_free(pblk);
>>>> @@ -998,11 +999,13 @@ static void *pblk_init(struct nvm_tgt_dev *dev,
>>>> struct gendisk *tdisk,
>>>> pblk->pad_rst_wa = 0;
>>>> pblk->gc_rst_wa = 0;
>>>> + atomic_long_set(&pblk->nr_flush, 0);
>>>> + pblk->nr_flush_rst = 0;
>>>> +
>>>> #ifdef CONFIG_NVM_DEBUG
>>>> atomic_long_set(&pblk->inflight_writes, 0);
>>>> atomic_long_set(&pblk->padded_writes, 0);
>>>> atomic_long_set(&pblk->padded_wb, 0);
>>>> - atomic_long_set(&pblk->nr_flush, 0);
>>>> atomic_long_set(&pblk->req_writes, 0);
>>>> atomic_long_set(&pblk->sub_writes, 0);
>>>> atomic_long_set(&pblk->sync_writes, 0);
>>>> @@ -1034,10 +1037,17 @@ static void *pblk_init(struct nvm_tgt_dev *dev,
>>>> struct gendisk *tdisk,
>>>> goto fail_free_luns;
>>>> }
>>>> + pblk->pad_dist = kzalloc((pblk->min_write_pgs - 1) *
>>>> sizeof(atomic64_t),
>>>> + GFP_KERNEL);
>>>> + if (!pblk->pad_dist) {
>>>> + ret = -ENOMEM;
>>>> + goto fail_free_line_meta;
>>>> + }
>>>> +
>>>> ret = pblk_core_init(pblk);
>>>> if (ret) {
>>>> pr_err("pblk: could not initialize core\n");
>>>> - goto fail_free_line_meta;
>>>> + goto fail_free_pad_dist;
>>>> }
>>>> ret = pblk_l2p_init(pblk);
>>>> @@ -1097,6 +1107,8 @@ static void *pblk_init(struct nvm_tgt_dev *dev,
>>>> struct gendisk *tdisk,
>>>> pblk_l2p_free(pblk);
>>>> fail_free_core:
>>>> pblk_core_free(pblk);
>>>> +fail_free_pad_dist:
>>>> + kfree(pblk->pad_dist);
>>>> fail_free_line_meta:
>>>> pblk_line_meta_free(pblk);
>>>> fail_free_luns:
>>>> diff --git a/drivers/lightnvm/pblk-rb.c b/drivers/lightnvm/pblk-rb.c
>>>> index 7044b5599cc4..264372d7b537 100644
>>>> --- a/drivers/lightnvm/pblk-rb.c
>>>> +++ b/drivers/lightnvm/pblk-rb.c
>>>> @@ -437,9 +437,7 @@ static int pblk_rb_may_write_flush(struct pblk_rb
>>>> *rb,
>>>> unsigned int nr_entries,
>>>> if (bio->bi_opf & REQ_PREFLUSH) {
>>>> struct pblk *pblk = container_of(rb, struct pblk, rwb);
>>>> -#ifdef CONFIG_NVM_DEBUG
>>>> atomic_long_inc(&pblk->nr_flush);
>>>> -#endif
>>>> if (pblk_rb_flush_point_set(&pblk->rwb, bio, mem))
>>>> *io_ret = NVM_IO_OK;
>>>> }
>>>> @@ -620,14 +618,17 @@ unsigned int pblk_rb_read_to_bio(struct pblk_rb
>>>> *rb,
>>>> struct nvm_rq *rqd,
>>>> pr_err("pblk: could not pad page in write
>>>> bio\n");
>>>> return NVM_IO_ERR;
>>>> }
>>>> +
>>>> + if (pad < pblk->min_write_pgs)
>>>> + atomic64_inc(&pblk->pad_dist[pad - 1]);
>>>> + else
>>>> + pr_warn("pblk: padding more than min.
>>>> sectors\n");
>>>
>>>
>>>
>>> Curious, when would this happen? Would it be an implementation error or
>>> something a user did wrong?
>>
>>
>> This would be an implementation error - and this check is just a
>> safeguard to make sure we won't go out of bounds when updating the
>> statistics.
>>
>
> Ah, does it make sense to have such defensive programming, when the value is
> never persisted? An 64 bit integer is quite large, and if only used for
> workloads for a single instance, it would practically never trigger, and if
> it did, do one care?
Ah, sorry for being unclear, it's not for checking if the pad
counters overflow - I wanted to make sure that we won't index
pad_dist with negative numbers if we mess up the padding calculations
in the future.
>
> <snip>
>
>>>
>>> Looks good to me. Let me know if you want to send a new patch, or if I
>>> may
>>> make the change when I pick it up.
>>
>>
>> Thanks for the review, i saw that you already reworked the patch a bit
>> on your core branch.
>> However, i found a build issue when building for i386, so i'll fix
>> that and submit a V2(that includes your update)
>
>
> Ok. I assume this is the kbuild mail I asked about a couple of days ago.
> I'll wait for v2.
I did see that particular mail, but it's most likely the same issue
that the V2 will fix.
Thanks,
Hans