Re: [PATCH v8 2/3] watchdog: add watchdog_cpumask sysctl to assist nohz

From: Chai Wen
Date: Thu Apr 16 2015 - 21:34:03 EST


On 04/15/2015 03:37 AM, Chris Metcalf wrote:

> Change the default behavior of watchdog so it only runs on the
> housekeeping cores when nohz_full is enabled at build and boot time.
> Allow modifying the set of cores the watchdog is currently running
> on with a new kernel.watchdog_cpumask sysctl.
>
> If we allowed the watchdog to run on nohz_full cores, the timer
> interrupts and scheduler work would prevent the desired tickless
> operation on those cores. But if we disable the watchdog globally,
> then the housekeeping cores can't benefit from the watchdog
> functionality. So we allow disabling it only on some cores.
> See Documentation/lockup-watchdogs.txt for more information.
>
> Acked-by: Don Zickus <dzickus@xxxxxxxxxx>
> Signed-off-by: Chris Metcalf <cmetcalf@xxxxxxxxxx>
> ---
> v8: use new semantics of smpboot_update_cpumask_percpu_thread() [Frederic]
> improve documentation in "Documentation/" and in changelong [akpm]


s/changelong/changelog/

>
> v7: use cpumask field instead of valid_cpu() callback
>
> v6: use alloc_cpumask_var() [Sasha Levin]
> switch from watchdog_exclude to watchdog_cpumask [Frederic]
> simplify the smp_hotplug_thread API to watchdog [Frederic]
> add Don's Acked-by
>
> Documentation/lockup-watchdogs.txt | 18 ++++++++++++++
> Documentation/sysctl/kernel.txt | 15 ++++++++++++
> include/linux/nmi.h | 3 +++
> kernel/sysctl.c | 7 ++++++
> kernel/watchdog.c | 49 ++++++++++++++++++++++++++++++++++++++
> 5 files changed, 92 insertions(+)
>
> diff --git a/Documentation/lockup-watchdogs.txt b/Documentation/lockup-watchdogs.txt
> index ab0baa692c13..22dd6af2e4bd 100644
> --- a/Documentation/lockup-watchdogs.txt
> +++ b/Documentation/lockup-watchdogs.txt
> @@ -61,3 +61,21 @@ As explained above, a kernel knob is provided that allows
> administrators to configure the period of the hrtimer and the perf
> event. The right value for a particular environment is a trade-off
> between fast response to lockups and detection overhead.
> +
> +By default, the watchdog runs on all online cores. However, on a
> +kernel configured with NO_HZ_FULL, by default the watchdog runs only
> +on the housekeeping cores, not the cores specified in the "nohz_full"
> +boot argument. If we allowed the watchdog to run by default on
> +the "nohz_full" cores, we would have to run timer ticks to activate
> +the scheduler, which would prevent the "nohz_full" functionality
> +from protecting the user code on those cores from the kernel.
> +Of course, disabling it by default on the nohz_full cores means that
> +when those cores do enter the kernel, by default we will not be
> +able to detect if they lock up. However, allowing the watchdog
> +to continue to run on the housekeeping (non-tickless) cores means
> +that we will continue to detect lockups properly on those cores.
> +
> +In either case, the set of cores excluded from running the watchdog
> +may be adjusted via the kernel.watchdog_cpumask sysctl. For
> +nohz_full cores, this may be useful for debugging a case where the
> +kernel seems to be hanging on the nohz_full cores.
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index c831001c45f1..f1697858d71c 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -923,6 +923,21 @@ and nmi_watchdog.
>
> ==============================================================
>
> +watchdog_cpumask:
> +
> +This value can be used to control on which cpus the watchdog may run.
> +The default cpumask is all possible cores, but if NO_HZ_FULL is
> +enabled in the kernel config, and cores are specified with the
> +nohz_full= boot argument, those cores are excluded by default.
> +Offline cores can be included in this mask, and if the core is later
> +brought online, the watchdog will be started based on the mask value.
> +
> +Typically this value would only be touched in the nohz_full case
> +to re-enable cores that by default were not running the watchdog,
> +if a kernel lockup was suspected on those cores.
> +
> +==============================================================
> +
> watchdog_thresh:
>
> This value can be used to control the frequency of hrtimer and NMI
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 3d46fb4708e0..f94da0e65dea 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -67,6 +67,7 @@ extern int nmi_watchdog_enabled;
> extern int soft_watchdog_enabled;
> extern int watchdog_user_enabled;
> extern int watchdog_thresh;
> +extern unsigned long *watchdog_cpumask_bits;
> extern int sysctl_softlockup_all_cpu_backtrace;
> struct ctl_table;
> extern int proc_watchdog(struct ctl_table *, int ,
> @@ -77,6 +78,8 @@ extern int proc_soft_watchdog(struct ctl_table *, int ,
> void __user *, size_t *, loff_t *);
> extern int proc_watchdog_thresh(struct ctl_table *, int ,
> void __user *, size_t *, loff_t *);
> +extern int proc_watchdog_cpumask(struct ctl_table *, int,
> + void __user *, size_t *, loff_t *);
> #endif
>
> #ifdef CONFIG_HAVE_ACPI_APEI_NMI
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 2082b1a88fb9..699571a74e3b 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -881,6 +881,13 @@ static struct ctl_table kern_table[] = {
> .extra2 = &one,
> },
> {
> + .procname = "watchdog_cpumask",
> + .data = &watchdog_cpumask_bits,
> + .maxlen = NR_CPUS,
> + .mode = 0644,
> + .proc_handler = proc_watchdog_cpumask,
> + },
> + {
> .procname = "softlockup_panic",
> .data = &softlockup_panic,
> .maxlen = sizeof(int),
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 2316f50b07a4..5bd80a212486 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -19,6 +19,7 @@
> #include <linux/sysctl.h>
> #include <linux/smpboot.h>
> #include <linux/sched/rt.h>
> +#include <linux/tick.h>
>
> #include <asm/irq_regs.h>
> #include <linux/kvm_para.h>
> @@ -56,6 +57,9 @@ int __read_mostly sysctl_softlockup_all_cpu_backtrace;
> #else
> #define sysctl_softlockup_all_cpu_backtrace 0
> #endif
> +static cpumask_var_t watchdog_cpumask_for_smpboot;
> +static cpumask_var_t watchdog_cpumask;
> +unsigned long *watchdog_cpumask_bits;
>
> static int __read_mostly watchdog_running;
> static u64 __read_mostly sample_period;
> @@ -694,6 +698,7 @@ static int watchdog_enable_all_cpus(void)
> int err = 0;
>
> if (!watchdog_running) {
> + cpumask_copy(watchdog_cpumask_for_smpboot, watchdog_cpumask);
> err = smpboot_register_percpu_thread(&watchdog_threads);
> if (err)
> pr_err("Failed to create watchdog threads, disabled\n");
> @@ -869,12 +874,56 @@ out:
> mutex_unlock(&watchdog_proc_mutex);
> return err;
> }
> +
> +/*
> + * The cpumask is the mask of possible cpus that the watchdog can run
> + * on, not the mask of cpus it is actually running on. This allows the
> + * user to specify a mask that will include cpus that have not yet
> + * been brought online, if desired.
> + */
> +int proc_watchdog_cpumask(struct ctl_table *table, int write,
> + void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> + int err;
> +
> + mutex_lock(&watchdog_proc_mutex);
> + err = proc_do_large_bitmap(table, write, buffer, lenp, ppos);
> + if (!err && write) {
> + /* Remove impossible cpus to keep sysctl output cleaner. */
> + cpumask_and(watchdog_cpumask, watchdog_cpumask,
> + cpu_possible_mask);
> +
> + if (watchdog_enabled && watchdog_thresh)


If the new mask is same as the current one, then there is no need to go on ?
cpus_equal(watchdog_cpumask, watchdog_cpumask_for_smpboot) or something else ?


> + smpboot_update_cpumask_percpu_thread(&watchdog_threads,
> + watchdog_cpumask);
> + }
> + mutex_unlock(&watchdog_proc_mutex);
> + return err;
> +}
> +
> #endif /* CONFIG_SYSCTL */
>
> void __init lockup_detector_init(void)
> {
> set_sample_period();
>
> + /* One cpumask is allocated for smpboot to own. */
> + alloc_cpumask_var(&watchdog_cpumask_for_smpboot, GFP_KERNEL);


alloc_cpumask_var could fail?

> + watchdog_threads.cpumask = watchdog_cpumask_for_smpboot;
> +
> + /* Another cpumask is allocated for /proc to use. */
> + alloc_cpumask_var(&watchdog_cpumask, GFP_KERNEL);


ditto

thanks
chai wen

> + watchdog_cpumask_bits = cpumask_bits(watchdog_cpumask);
> +
> +#ifdef CONFIG_NO_HZ_FULL
> + if (!cpumask_empty(tick_nohz_full_mask))
> + pr_info("Disabling watchdog on nohz_full cores by default\n");
> + cpumask_andnot(watchdog_cpumask, cpu_possible_mask,
> + tick_nohz_full_mask);
> +#else
> + cpumask_copy(watchdog_cpumask, cpu_possible_mask);
> +#endif
> +
> if (watchdog_enabled)
> watchdog_enable_all_cpus();
> }



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/