Re: [PATCH v3] kernel: add panic_on_taint
From: Baoquan He
Date: Sun May 10 2020 - 21:12:14 EST
On 05/10/20 at 02:22pm, Rafael Aquini wrote:
> > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > > index 7bc83f3d9bdf..4a69fe49a70d 100644
> > > --- a/Documentation/admin-guide/kernel-parameters.txt
> > > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > > @@ -3404,6 +3404,21 @@
> > > panic_on_warn panic() instead of WARN(). Useful to cause kdump
> > > on a WARN().
> > >
> > > + panic_on_taint= [KNL] conditionally panic() in add_taint()
> > > + Format: <str>
> > Changed it as 'Format: <string>' to be
> > consistent with the existing other options?
>
> I can resubmit with the change, if it's a strong req and the surgery
> cannot be done at merge time.
Yeah, maybe maintainer can help adjust this, not sure who will pick it.
No, it's not a strong request, people might get a little bit confusion
about which format should be referred to when a new kernel option is added.
>
>
> > > + Specifies, as a string, the TAINT flag set that will
> > > + compose a bitmask for calling panic() when the kernel
> > > + gets tainted.
> > > + See Documentation/admin-guide/tainted-kernels.rst for
> > > + details on the taint flags that users can pick to
> > > + compose the bitmask to assign to panic_on_taint.
> > > + When the string is prefixed with a '-' the bitmask
> > > + set in panic_on_taint will be mutually exclusive
> > > + with the sysctl knob kernel.tainted, and any attempt
> > > + to write to that sysctl will fail with -EINVAL for
> > > + any taint value that masks with the flags set for
> > > + this option.
> > > +
> > > crash_kexec_post_notifiers
> > > Run kdump after running panic-notifiers and dumping
> > > kmsg. This only for the users who doubt kdump always
> > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > > index 9b7a8d74a9d6..66bc102cb59a 100644
> > > --- a/include/linux/kernel.h
> > > +++ b/include/linux/kernel.h
> > > @@ -528,6 +528,8 @@ extern int panic_on_oops;
> > > extern int panic_on_unrecovered_nmi;
> > > extern int panic_on_io_nmi;
> > > extern int panic_on_warn;
> > > +extern unsigned long panic_on_taint;
> > > +extern bool panic_on_taint_exclusive;
> > > extern int sysctl_panic_on_rcu_stall;
> > > extern int sysctl_panic_on_stackoverflow;
> > >
> > > diff --git a/kernel/panic.c b/kernel/panic.c
> > > index b69ee9e76cb2..65c62f8a1de8 100644
> > > --- a/kernel/panic.c
> > > +++ b/kernel/panic.c
> > > @@ -25,6 +25,7 @@
> > > #include <linux/kexec.h>
> > > #include <linux/sched.h>
> > > #include <linux/sysrq.h>
> > > +#include <linux/ctype.h>
> > > #include <linux/init.h>
> > > #include <linux/nmi.h>
> > > #include <linux/console.h>
> > > @@ -44,6 +45,8 @@ static int pause_on_oops_flag;
> > > static DEFINE_SPINLOCK(pause_on_oops_lock);
> > > bool crash_kexec_post_notifiers;
> > > int panic_on_warn __read_mostly;
> > > +unsigned long panic_on_taint;
> > > +bool panic_on_taint_exclusive = false;
> > >
> > > int panic_timeout = CONFIG_PANIC_TIMEOUT;
> > > EXPORT_SYMBOL_GPL(panic_timeout);
> > > @@ -434,6 +437,11 @@ void add_taint(unsigned flag, enum lockdep_ok lockdep_ok)
> > > pr_warn("Disabling lock debugging due to kernel taint\n");
> > >
> > > set_bit(flag, &tainted_mask);
> > > +
> > > + if (tainted_mask & panic_on_taint) {
> > > + panic_on_taint = 0;
> >
> > This panic_on_taint resetting is redundant? It will trigger crash, do we
> > need care if it's 0 or not?
> >
>
> We might still get more than one CPU hitting a taint adding code path after
> the one that tripped here called panic. To avoid multiple calls to panic,
> in that particular scenario, we clear the panic_on_taint bitmask out.
> Also, albeit non-frequent, we might be tracking TAINT_WARN, and still hit
> a WARN_ON() in the panic / kdump path, thus incurring in a second
> (and unwanted) call to panic here.
Hmm, this cpu will set panic_cpu firstly, all other cpu need stop and
have no chance to execute panic. But yes, clearing panic_on_taint makes
code easier to understand.
>
>
> > > + panic("panic_on_taint set ...");
> > > + }
> > > }
> > > EXPORT_SYMBOL(add_taint);
> > >
> > > @@ -686,3 +694,35 @@ static int __init oops_setup(char *s)
> > > return 0;
> > > }
> > > early_param("oops", oops_setup);
> > > +
> > > +static int __init panic_on_taint_setup(char *s)
> > > +{
> > > + /* we just ignore panic_on_taint if passed without flags */
> > > + if (!s)
> > > + goto out;
> > > +
> > > + for (; *s; s++) {
> > > + int i;
> > > +
> > > + if (*s == '-') {
> > > + panic_on_taint_exclusive = true;
> > > + continue;
> > > + }
> > > +
> > > + for (i = 0; i < TAINT_FLAGS_COUNT; i++) {
> > > + if (toupper(*s) == taint_flags[i].c_true) {
> > > + set_bit(i, &panic_on_taint);
> > > + break;
> > > + }
> > > + }
> >
> > Read admin-guide/tainted-kernels.rst, but still do not get what 'G' means.
> > If I specify 'panic_on_taint="G"' or 'panic_on_taint="-G"' in cmdline,
> > what is expected for this customer behaviour?
> >
>
> This will not panic the system as no taint flag gets actually set in
> panic_on_taint bitmask for G.
>
> G is the counterpart of P, and appears on print_tainted() whenever
> TAINT_PROPRIETARY_MODULE is not set. panic_on_taint doesn't set
> anything for G, as it doesn't represent any taint, but the lack
> of one particular taint, instead.
>
> (apparently, TAINT_PROPRIETARY_MODULE is the only taint flag
> that follows that pattern of having an extra assigned letter
> that means its absence, and perhaps it should be removed)
Yeah, agree. I will make a draft patch to remove it, see if there's
objection from people.