Re: [PATCH 1/2 v2] mm: Allow small allocations to fail

From: Tetsuo Handa
Date: Wed Mar 18 2015 - 08:36:45 EST


Vlastimil Babka wrote:
> I'll add that I think if we do improve the reclaim etc, and make
> allocations failures rarer, then the whole testing effort will have much
> lower chance of finding the places where allocation failures are not
> handled properly. Also Michal says that catching those depend on running
> all "their loads which we never dreamed of". In that case, if our goal
> is to fix all broken allocation sites with some quantifiable
> probability, I'm afraid we might be really better off with some form of
> fault injection, which will trigger the failures with the probability we
> set, and not depend on corner case low memory conditions manifesting
> just at the time the workload is at one of the broken allocation sites.
>

I think we can use SystemTap based fault injection which allows only once
injection per each backtrace without putting the system under OOM condition,
which I demonstrated at https://lkml.org/lkml/2014/12/25/64 .

Since SystemTap can generate backtraces without garbage lines,
we can uniquely identify and inject only once per each backtrace,
making it possible to test every memory allocation callers.

Steps for installation and testing are described below.

---------- installation start ----------
wget https://sourceware.org/systemtap/ftp/releases/systemtap-2.7.tar.gz
echo 'e0c3c36955323ae59be07a26a9563474 systemtap-2.7.tar.gz' | md5sum --check -
tar -zxf systemtap-2.7.tar.gz
cd systemtap-2.7
./configure --prefix=$HOME/systemtap.tmp
make -s
make -s install
---------- installation end ----------

---------- preparation (optional) start ----------
Start kdump service and set /proc/sys/kernel/panic_on_oops to 1
as root user so that we can obtain vmcore upon kernel oops.
---------- preparation (optional) end ----------

---------- testing start ----------
Run

$HOME/systemtap.tmp/bin/staprun fault_injection.ko

and operate as you like, and see whether your system can survive or not.
---------- testing end ----------

The fault_injection.ko is generated by commands shown below.
Scripts shown below checks only sleepable allocations. If you
replace %{ __GFP_WAIT %} with 0, you can check atomic allocations.

---------- For testing __kmalloc() failure ----------
$HOME/systemtap.tmp/bin/stap -p4 -m fault_injection -g -DSTP_NO_OVERLOAD -e '
global traces_bt[65536];
probe begin { printf("Probe start!\n"); }
probe kernel.function("__kmalloc") {
if (($flags & %{ __GFP_NOFAIL | __GFP_WAIT %} ) == %{ __GFP_WAIT %} && execname() != "stapio") {
bt = backtrace();
if (traces_bt[bt]++ == 0) {
printf("%s (%u) size:%u gfp:0x%x\n", execname(), tid(), $size, $flags);
print_stack(bt);
printf("\n\n");
$size = 1 << 30;
}
}
}
probe end { delete traces_bt; }'
---------- For testing __kmalloc() failure ----------

Like an example shown below demonstrate, we will be able to selectively
test specific subsystems by setting per a task_struct marker.

---------- For testing __alloc_pages_nodemask() failure except page fault ----------
$HOME/systemtap.tmp/bin/stap -p4 -m fault_injection -g -DSTP_NO_OVERLOAD -e '
global traces_bt[65536];
global in_page_fault%;
probe begin { printf("Probe start!\n"); }
probe kernel.function("__alloc_pages_nodemask") {
if (($gfp_mask & %{ __GFP_NOFAIL | __GFP_WAIT %} ) == %{ __GFP_WAIT %} &&
in_page_fault[tid()] == 0 && execname() != "stapio") {
bt = backtrace();
if (traces_bt[bt]++ == 0) {
printf("%s (%u) order:%u gfp:0x%x\n", execname(), tid(), $order, $gfp_mask);
print_stack(bt);
printf("\n\n");
$order = 1 << 30;
$gfp_mask = $gfp_mask | %{ __GFP_NORETRY %};
}
}
}
probe kernel.function("handle_mm_fault") {
in_page_fault[tid()]++;
}
probe kernel.function("handle_mm_fault").return {
in_page_fault[tid()]--;
}
probe end { delete traces_bt; delete in_page_fault; }'
---------- For testing __alloc_pages_nodemask() failure except page fault ----------
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/