Re: [dm-devel] [PATCH v5] fault-injection: introduce kvmalloc fallback options
From: Mikulas Patocka
Date: Thu Apr 26 2018 - 18:52:13 EST
On Fri, 27 Apr 2018, Michael S. Tsirkin wrote:
> On Thu, Apr 26, 2018 at 05:50:20PM -0400, Mikulas Patocka wrote:
> > How is the user or developer supposed to learn about this option, if
> > he gets no crash at all?
>
> Look in /sys/kernel/debug/fail* ? That actually lets you
> filter by module, process etc.
>
> I think this patch conflates two things:
>
> 1. Make kvmalloc use the vmalloc path.
> This seems a bit narrow.
> What is special about kvmalloc? IMHO nothing - it's yet another user
> of __GFP_NORETRY or __GFP_RETRY_MAYFAIL. As any such
__GFP_RETRY_MAYFAIL makes the allocator retry the costly_order allocations
> user, it either recovers correctly or not.
> So IMHO it's just a case of
> making __GFP_NORETRY, __GFP_RETRY_MAYFAIL, or both
> fail once in a while.
> Seems like a better extension to me than focusing on vmalloc.
> I think you will find more bugs this way.
If the array is <= PAGE_SIZE, vmalloc will not use __GFP_NORETRY. So it
still hides some bugs - such as, if a structure grows above 4k, it would
start randomly crashing due to memory fragmentation.
> 2. Ability to control this from a separate config
> option.
>
> It's still not that clear to me why is this such a
> hard requirement. If a distro wants to force specific
> boot time options, why isn't CONFIG_CMDLINE sufficient?
There are 489 kernel options declared with the __setup keyword. Hardly any
kernel developer notices that a new one was added and selects it when
testing his code.
> But assuming it's important to control this kind of
> fault injection to be controlled from
> a dedicated menuconfig option, why not the rest of
> faults?
The injected faults cause damage to the user, so there's no point to
enable them by default. vmalloc fallback should not cause any damage
(assuming that the code is correctly written).
> IMHO if you split 1/2 up, and generalize, the path upstream
> will be much smoother.
This seems like a lost case. So, let's not care about code correctness and
let's solve crashes only after they are reported. If the upstream wants to
work this way, there's nothing that can be done about it.
I'm wondering if I can still push it to RHEL or not.
> Hope this helps.
>
> --
> MST
Mikulas