Re: [Query] increased latency observed in cpu hotplug path

From: Khan, Imran
Date: Wed Aug 17 2016 - 12:47:50 EST

Next message: Robert Foss: "Re: [PACTH v2 0/3] Implement /proc/<pid>/totmaps"
Previous message: Michal Hocko: "Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression"
In reply to: Khan, Imran: "Re: [Query] increased latency observed in cpu hotplug path"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 8/16/2016 11:23 AM, Khan, Imran wrote:
> On 8/5/2016 12:49 PM, Khan, Imran wrote:
>> On 8/1/2016 2:58 PM, Khan, Imran wrote:
>>> On 7/30/2016 7:54 AM, Akinobu Mita wrote:
>>>> 2016-07-28 22:18 GMT+09:00 Khan, Imran <kimran@xxxxxxxxxxxxxx>:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Recently we have observed some increased latency in CPU hotplug
>>>>> event in CPU online path. For online latency we see that block
>>>>> layer is executing notification handler for CPU_UP_PREPARE event
>>>>> and this in turn waits for RCU grace period resulting (sometimes)
>>>>> in an execution time of 15-20 ms for this notification handler.
>>>>> This change was not there in 3.18 kernel but is present in 4.4
>>>>> kernel and was introduced by following commit:
>>>>>
>>>>>
>>>>> commit 5778322e67ed34dc9f391a4a5cbcbb856071ceba
>>>>> Author: Akinobu Mita <akinobu.mita@xxxxxxxxx>
>>>>> Date: Sun Sep 27 02:09:23 2015 +0900
>>>>>
>>>>> blk-mq: avoid inserting requests before establishing new mapping
>>>>
>>>> ...
>>>>
>>>>> Upon reverting this commit I could see an improvement of 15-20 ms in
>>>>> online latency. So I am looking for some help in analyzing the effects
>>>>> of reverting this or should some other approach to reduce the online
>>>>> latency must be taken.
>>>>
>>>> Can you observe the difference in online latency by removing
>>>> get_online_cpus() and put_online_cpus() pair in blk_mq_init_allocated_queue()
>>>> instead of full reverting the commit?
>>>>
>>> Hi Akinobu,
>>> I tried your suggestion but could not achieve any improvement. Actually the snippet that is causing the change in latency is the following one :
>>>
>>> list_for_each_entry(q, &all_q_list, all_q_node) {
>>> blk_mq_freeze_queue_wait(q);
>>>
>>> /*
>>> * timeout handler can't touch hw queue during the
>>> * reinitialization
>>> */
>>> del_timer_sync(&q->timeout);
>>> }
>>>
>>> I understand that this is getting executed now for CPU_UP_PREPARE as well resulting in
>>> increased latency in the cpu online path. I am trying to reduce this latency while keeping the
>>> purpose of this commit intact. I would welcome further suggestions/feedback in this regard.
>>>
>> Hi Akinobu,
>>
>> I am not able to reduce the cpu online latency with this patch, could you please let me know what
>> functionality will be broken, if we avoid this patch in our kernel. Also if you have some other
>> suggestions towards improving this patch please let me know.
>>
> After moving the remapping of queues to block layer's kworker I see that online latency has improved
> while offline latency remains the same. As the freezing of queues happens in the context of block layer's
> worker, I think it would be better to do the remapping in the same context and then go ahead with freezing.
> In this regard I have made following change:
>
> commit b2131b86eeef4c5b1f8adaf7a53606301aa6b624
> Author: Imran Khan <kimran@xxxxxxxxxxxxxx>
> Date: Fri Aug 12 19:59:47 2016 +0530
>
> blk-mq: Move block queue remapping from cpu hotplug path
>
> During a cpu hotplug, the hardware and software contexts mappings
> need to be updated in order to take into account requests
> submitted for the hotadded CPU. But if this mapping is done
> in hotplug notifier, it deteriorates the hotplug latency.
> So move the block queue remapping to block layer worker which
> results in significant improvements in hotplug latency.
>
> Change-Id: I01ac83178ce95c3a4e3b7b1b286eda65ff34e8c4
> Signed-off-by: Imran Khan <kimran@xxxxxxxxxxxxxx>
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 6d6f8fe..06fcf89 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -22,7 +22,11 @@
> #include <linux/sched/sysctl.h>
> #include <linux/delay.h>
> #include <linux/crash_dump.h>
> -
> +#include <linux/kthread.h>
> +#include <linux/mutex.h>
> +#include <linux/sched.h>
> +#include <linux/kthread.h>
> +#include <linux/mutex.h>
> #include <trace/events/block.h>
>
> #include <linux/blk-mq.h>
> @@ -32,10 +36,18 @@
>
> static DEFINE_MUTEX(all_q_mutex);
> static LIST_HEAD(all_q_list);
> -
> static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx);
>
> /*
> + * New online cpumask which is going to be set in this hotplug event.
> + * Declare this cpumasks as global as cpu-hotplug operation is invoked
> + * one-by-one and dynamically allocating this could result in a failure.
> + */
> +static struct cpumask online_new;
> +
> +static struct work_struct blk_mq_remap_work;
> +
> +/*
> * Check if any of the ctx's have pending work in this hardware queue
> */
> static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
> @@ -2125,14 +2137,7 @@ static void blk_mq_queue_reinit(struct request_queue *q,
> static int blk_mq_queue_reinit_notify(struct notifier_block *nb,
> unsigned long action, void *hcpu)
> {
> - struct request_queue *q;
> int cpu = (unsigned long)hcpu;
> - /*
> - * New online cpumask which is going to be set in this hotplug event.
> - * Declare this cpumasks as global as cpu-hotplug operation is invoked
> - * one-by-one and dynamically allocating this could result in a failure.
> - */
> - static struct cpumask online_new;
>
> /*
> * Before hotadded cpu starts handling requests, new mappings must
> @@ -2155,43 +2160,17 @@ static int blk_mq_queue_reinit_notify(struct notifier_block *nb,
> case CPU_DEAD:
> case CPU_UP_CANCELED:
> cpumask_copy(&online_new, cpu_online_mask);
> + kblockd_schedule_work(&blk_mq_remap_work);
> break;
> case CPU_UP_PREPARE:
> cpumask_copy(&online_new, cpu_online_mask);
> cpumask_set_cpu(cpu, &online_new);
> + kblockd_schedule_work(&blk_mq_remap_work);
> break;
> default:
> return NOTIFY_OK;
> }
>
> - mutex_lock(&all_q_mutex);
> -
> - /*
> - * We need to freeze and reinit all existing queues. Freezing
> - * involves synchronous wait for an RCU grace period and doing it
> - * one by one may take a long time. Start freezing all queues in
> - * one swoop and then wait for the completions so that freezing can
> - * take place in parallel.
> - */
> - list_for_each_entry(q, &all_q_list, all_q_node)
> - blk_mq_freeze_queue_start(q);
> - list_for_each_entry(q, &all_q_list, all_q_node) {
> - blk_mq_freeze_queue_wait(q);
> -
> - /*
> - * timeout handler can't touch hw queue during the
> - * reinitialization
> - */
> - del_timer_sync(&q->timeout);
> - }
> -
> - list_for_each_entry(q, &all_q_list, all_q_node)
> - blk_mq_queue_reinit(q, &online_new);
> -
> - list_for_each_entry(q, &all_q_list, all_q_node)
> - blk_mq_unfreeze_queue(q);
> -
> - mutex_unlock(&all_q_mutex);
> return NOTIFY_OK;
> }
>
> @@ -2357,11 +2336,48 @@ void blk_mq_enable_hotplug(void)
> mutex_unlock(&all_q_mutex);
> }
>
> +static void blk_remap_work(struct work_struct *work)
> +{
> + struct request_queue *q;
> +
> + /* Start the task of queue remapping */
> + mutex_lock(&all_q_mutex);
> +
> + /*
> + * We need to freeze and reinit all existing queues. Freezing
> + * involves synchronous wait for an RCU grace period and doing it
> + * one by one may take a long time. Start freezing all queues in
> + * one swoop and then wait for the completions so that freezing can
> + * take place in parallel.
> + */
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_mq_freeze_queue_start(q);
> + list_for_each_entry(q, &all_q_list, all_q_node) {
> + blk_mq_freeze_queue_wait(q);
> +
> + /*
> + * timeout handler can't touch hw queue during the
> + * reinitialization
> + */
> + del_timer_sync(&q->timeout);
> + }
> +
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_mq_queue_reinit(q, &online_new);
> +
> + list_for_each_entry(q, &all_q_list, all_q_node)
> + blk_mq_unfreeze_queue(q);
> +
> + mutex_unlock(&all_q_mutex);
> +}
> +
> +
> static int __init blk_mq_init(void)
> {
> blk_mq_cpu_init();
>
> hotcpu_notifier(blk_mq_queue_reinit_notify, 0);
> + INIT_WORK(&blk_mq_remap_work, blk_remap_work);
>
> return 0;
> }
>
> I have run overnight tests of fsstress and multiple dd commands along with cpu hotplugging.
> Could you please review this change and let me know if you see any issues with this change or if I need
> to test some specific corner cases to validate this change.
>
Could someone kindly review this change
and provide feedback?

--
Imran Khan
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a\nmember of the Code Aurora Forum, hosted by The Linux Foundation

Next message: Robert Foss: "Re: [PACTH v2 0/3] Implement /proc/<pid>/totmaps"
Previous message: Michal Hocko: "Re: [LKP] [lkp] [xfs] 68a9f5e700: aim7.jobs-per-min -13.6% regression"
In reply to: Khan, Imran: "Re: [Query] increased latency observed in cpu hotplug path"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]