Re: [PATCH] mm: don't rely on system state to detect hot-plug operations

From: Michal Hocko
Date: Wed Sep 09 2020 - 13:05:04 EST


On Wed 09-09-20 14:32:57, David Hildenbrand wrote:
> On 09.09.20 14:30, Greg Kroah-Hartman wrote:
> > On Wed, Sep 09, 2020 at 11:24:24AM +0200, David Hildenbrand wrote:
> >>>> I am not sure an enum is going to make the existing situation less
> >>>> messy. Sure we somehow have to distinguish boot init and runtime hotplug
> >>>> because they have different constrains. I am arguing that a) we should
> >>>> have a consistent way to check for those and b) we shouldn't blow up
> >>>> easily just because sysfs infrastructure has failed to initialize.
> >>>
> >>> For the point a, using the enum allows to know in register_mem_sect_under_node()
> >>> if the link operation is due to a hotplug operation or done at boot time.
> >>>
> >>> For the point b, one option would be ignore the link error in the case the link
> >>> is already existing, but that BUG_ON() had the benefit to highlight the root issue.
> >>>
> >>
> >> WARN_ON_ONCE() would be preferred - not crash the system but still
> >> highlight the issue.
> >
> > Many many systems now run with 'panic on warn' enabled, so that wouldn't
> > change much :(
> >
> > If you can warn, you can properly just print an error message and
> > recover from the problem.
>
> Maybe VM_WARN_ON_ONCE() then to detect this during testing?
>
> (we basically turned WARN_ON_ONCE() useless with 'panic on warn' getting
> used in production - behaves like BUG_ON and BUG_ON is frowned upon)

VM_WARN* is not that much different from panic on warn. Still one can
argue that many workloads enable it just because. And I would disagree
that we should care much about those because those are debugging
features and everybody has to take consequences.

On the other hand the question is whether WARN is giving us much. So
what is the advantage over a simple pr_err? We will get a backtrace.
Interesting but not really that useful because there are only few code
paths this can trigger from. Registers dump? Not really useful here.
Taint flag, probably useful because follow up problems might give us a
hint that this might be related. People tend to pay more attention to
WARN splat than a single line error. Well, not really a strong reason, I
would say.

So while I wouldn't argue against WARN* in general (just because somebody
might be setting the system to panic), I would also think of how much
useful the splat is.

--
Michal Hocko
SUSE Labs